NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



latency: All content tagged as latency in NoSQL databases and polyglot persistence

When More Machines Equals Worse Results

Galileo observed how things broke if they were naively scaled up.

Google found the larger the scale the greater the impact of latency variability. When a request is implemented by work done in parallel, as is common with today’s service oriented systems, the overall response time is dominated by the long tail distribution of the parallel operations. Every response must have a consistent and low latency or the overall operation response time will be tragically slow.

Fantastic post from Todd Hoff on the (hopefully) well known truth: “the reponse time in a distributed parallel systems is the time of the slowest component“.

Original title and link: When More Machines Equals Worse Results (NoSQL database©myNoSQL)


Visualizing System Latency

Besides the many practical lessons emphasized in Jack Clark’s interview with Adrian Cockcroft on ZDNet—luckly I’ve had the chance to see some of Cockcroft’s presentations about Netflix architecture and also talk to him directly—one thing that sticked with me was the ending paragraph:

The thing I’ve been publicly asking for has been better IO in the cloud. Obviously I want SSDs in there. We’ve been asking cloud vendors to do that for a while. With Cassandra, we’ve had to go onto horizontal scale and use the internal disks and triple replicate across availability zones, so you end up with a triple-redundant data store that is careful not to overload the disks.

That reminded me of this old ACM article authored by Brendan Gregg:

When I/O latency is presented as a visual heat map, some intriguing and beautiful patterns can emerge. These patterns provide insight into how a system is actually performing and what kinds of latency end-user applications experience. Many characteristics seen in these patterns are still not understood, but so far their analysis is revealing systemic behaviors that were previously unknown.

Visualizing system latency - Sequential disk reads, stepping disk count

I was wondering if in the NoSQL databases space (and data storage space in general) are there any of the monitoring tools that provide such advanced visualization of latency data. Do you know any?

Original title and link: Visualizing System Latency (NoSQL database©myNoSQL)

Google Research: Let's Make TCP Faster

Google is actively researching ways to improve TCP:

Our research shows that the key to reducing latency is saving round trips. We’re experimenting with several improvements to TCP. Here’s a summary of some of our recommendations to make TCP faster:

  1. Increase TCP initial congestion window to 10 (IW10). The amount of data sent at the beginning of a TCP connection is currently 3 packets, implying 3 round trips (RTT) to deliver a tiny 15KB-sized content.
  2. Reduce the initial timeout from 3 seconds to 1 second.
  3. Use TCP Fast Open (TFO).
  4. Use Proportional Rate Reduction for TCP (PRR).

The database world attacked the network latency with connection pools and pipelining. For reducing network round trips we’ve used JOINs or denormalized data. But all software architectures will benefit from a faster TCP.

Andrei Savu

Original title and link: Google Research: Let’s Make TCP Faster (NoSQL database©myNoSQL)