NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



distributed: All content tagged as distributed in NoSQL databases and polyglot persistence

9 Things to Acknowledge about NoSQL Databases

Excellent list:

  1. Understand how ACID compares with BASE (Basically Available, Soft-state, Eventually Consistent)
  2. Understand persistence vs non-persistence, i.e., some NoSQL technologies are entirely in-memory data stores
  3. Recognize there are entirely different data models from traditional normalized tabular formats: Columnar (Cassandra) vs key/value (Memcached) vs document-oriented (CouchDB) vs graph oriented (Neo4j)
  4. Be ready to deal with no standard interface like JDBC/ODBC or standarized query language like SQL; every NoSQL tool has a different interface
  5. Architects: rewire your brain to the fact that web-scale/large-scale NoSQL systems are distributed across dozens to hundreds of servers and networks as opposed to a shared database system
  6. Get used to the possibly uncomfortable realization that you won’t know where data lives (most of the time)
  7. Get used to the fact that data may not always be consistent; ‘eventually consistent’ is one of the key elements of the BASE model
  8. Get used to the fact that data may not always be available
  9. Understand that some solutions are partition-tolerant and some are not

Print it out and distribute it among your colleagues.

Original title and link: 9 Things to Acknowledge about NoSQL Databases (NoSQL databases © myNoSQL)


Ehcache: Distributed Cache or NoSQL Store?

This is a guest post by Greg Luck, Founder and CTO, Ehcache .

Is Ehcache a NoSQL store? No, I would not characterise it as that, but I have seen it used for some NoSQL use cases. In these situations it compared very well — with higher performance and more flexible consistency than the well-known NoSQL stores. Let me explain.

Ehcache is the de facto open source cache for Java. It is used to boost performance, offload databases and simplify scalability. Backed by the Terracotta Server Array, Ehcache becomes a linearly scalable distributed cache. It is a schema-less, key-value, Java-based distributed cache. It provides flexible consistency control, data durability, and with the release of Ehcache 2.4, search by key, value and attribute indexes.

Flexible Consistency

From the very first integration of Ehcache and Terracotta, we enabled coherent data sharing across a cluster. In recognition of the CAP theorem and the limitations of having a single hard-coded trade-off, we created a rich consistency model configurable on a cache-by-cache basis. And to ease understanding, we adopted the standard client consistency model to describe and configure it.

We offer a much richer consistency model than NoSQL solutions. Across a cluster we offer on a per cache basis:

  • Pessimistically locked strong consistency (the default)
  • Unlocked Weak Consistency, with read-your-writes, monotonic reads and monotonic writes
  • Optimistically locked Compare and Swap (“CAS”) atomic operations across the cluster
  • XA and Local transactions
  • An explicit locking API that allows custom consistency


The Ehcache architecture is very different from NoSQL architectures. With Ehcache, each application server JVM has a resident hot set of the cache determined using an LRU algorithm. The size of that hot set can be 4-6 GB for heap storage, and with BigMemory, can be hundreds of GBs. This is the Level 1 (“L1”) cache and is entirely in-process. Access from the L1 is less than 1 μs.

The entire cache is always stored in the Terracotta Server Array. This is the Level 2 (“L2”) cache. Access from the L2 is less than 2 ms.

The mileage you get from this architecture depends on the nature of your usage profile. The most common one, the Pareto distribution, will read from a correctly-sized L1 80% of the time, and from the L2 20% of the time. So the average latency for the most common case is less than .401 ms. By comparison, the Yahoo! Cloud Serving Benchmark shows the average latencies of HBase and Cassandra ranging from 8 to 18 ms.

That makes Ehcache an order of magnitude faster than NoSQL.


Caches can be set to be persistent, which gives durability and restartability. A write ahead log is used, and the cache will recover to a consistent state. When configured as a distributed cache, Ehcache uses the Terracotta Server Array as a Level 2 cache. Terracotta servers are usually deployed in pairs for each partition, giving HA. In addition, backups can be taken from Terracotta using our JMX tools, or directly from the underlying storage mechanism. Backups and recoveries can be done live. RDBMSs typically offer a very extensive tool set for archiving, ETL and so on. For that reason I think of Ehcache as being durable with a small “d”. Having said that, perhaps this also applies to many NoSQL implementations.

Big Data

The definition of Big Data is a moving target. Today it is generally understood to start at a few dozen terabytes and go up into petabytes.

By this definition we do not do Big Data, but we are close. We have seen users take Ehcache up to about 2 TB with the current implementation. And if the past is anything to go by, each major release of Ehcache supports data volumes that have increased by an order or magnitude.

Another way to look at Big Data is that it is defined as data that is too large or unwieldy to process using RDBMSs. Because the mean latency from the cache is much lower for data retrieval, Ehcache works well for applications requiring rapid response, and thus serves big data hot sets well.

Finally, with BigMemory we can create very high densities. A Terracotta server can hold 250GB or more in memory per server. NoSQL solutions use a mix of memory and disk. Java based ones like Cassandra are subject to garbage collection issues, so are limited to running with very small heaps. BigMemory stores cache data off-heap but within the Java process using NIO’s DirectByteBuffer. The end result is that you can get the same storage using a much smaller server deployment.

Enter Distributed Caching

So if Ehcahce is not NoSQL, what is it? The answer is that Ehcache is a distributed cache. Like its NoSQL cousins, it is often used when the database cannot cope. But more than just data can be cached. For example, caching a web page or an expensive CPU computation are also common use cases.

The focus is on fast access to these cached results, not persistence. This fast access is expressed architecturally in Java as in-process caching, and over the network as in-memory cache stores.

Helping to draw the distinction, both Gartner and Forrester have in the last year created their respective definitions of Distributed Caching and Elastic Caching. According to Gartner, “Distributed caching platforms enable users to manage very large in-memory data stores to enable DBMS workload offloading, cloud and cloud transaction processing, extreme transaction processing, complex-event processing, and high-performance computing”. They also added distributed caches to their application Platform as a Service (“aPaaS”) reference architecture.

This makes sense. We see distributed caches being used alongside both RDBMSs and NoSQL.

But once the cache gets distributed and takes on enterprise features like search, the capabilities expand and overlap with NoSQL. Yet the difference in emphasis remains.

Use Cases

So what are some use cases where you might want to consider a distributed cache? In general, any use case where a ‘key-value plus search’-type NoSQL solution fits. As long as data volumes are less than 2 TB and the durability toolset is acceptable.

Specifically we see the following use cases:

  • General Purpose Caching

    e.g. Hibernate Caching. JDBC caching. Web caching. Collection caching. We do this one pretty thoroughly.

  • In conjunction with an in-house or third-party analytics engine, very fast lookup of analytics results.

    e.g. A credit card company needs to score real-time credit card transactions. There are hundreds of millions per day. Results of an in-house fraud model with transactions up to the close of business the previous day are loaded into the cache. The cache is further adjusted during the current business day for actual usage and can return fraud scores on billions of credit card numbers in a fraction of a second.

  • System of Record (“SOR”) for short to medium term business processes

    e.g. A phone company does mobile phone contract processing and provisioning with rapidly changing plan details. They create a value object for each plan and persist that to the cache, avoiding database schema changes thus adding business agility. The final result of the provisioning is recorded in the database after approximately two weeks. The value in the cache is akin to the document in a Document Store.

  • In-memory dataset search, including applications where all data can be held in memory as well as those where a partial (or hot) dataset is in memory

    e.g. A logistics company needs to lookup consignment nodes by id, sender name, addressee name and date range. They generate 400 GB per fortnight and 98% of searches are within two weeks. The database is overloaded handling the volume of lookups. By storing the most used data in the cache a 98% database offload can be achieved.


Distributed Caches and NoSQL Stores have been born out of the same need to supplement the RDBMS in powering web scale architectures. Both are cognizant of the CAP theorem and the impossibility of taking along the old certainties into a large scale distributed world.

While NoSQL is aimed at replacing the durability feature of RDBMS, caches are aimed at low latency and speed. Moreover, the cache is also seeking to avoid forcing the application to go out to any store, whether it be RDBMS or NoSQL.

I see distributed caches and NoSQL as being two useful and complimentary technologies that can supplement the RDBMS for what it cannot do, but also provide their own unique new features.

Original title and link: Ehcache: Distributed Cache or NoSQL Store? (NoSQL databases © myNoSQL)

Recipe for a Distributed Realtime Tweet Search System



  1. Place Voldemort, Kafka, and Sensei on a couple of servers.
  2. Arrange them with taste:

    Chirp Architecture

  3. Spray a large quantity of tweets on the system

Preparation time:

24 hours


For more servings, add the appropriate number of servers.


Chirper on Github


  • One design choice was letting the process that writes to Voldemort also be a Kafka consumer. Although this would be cleaner, we would risk a data-race where search may return hit array before they are yet added to Voldemort. By making sure it is first added to Voldemort, we can rely on it being an authoritative storage for our tweets.
  • You may have already realized Kafka is acting as a proxy for twitter stream, and we could have also streamed tweets directly into the search systems, bypassing the Kafka layer. What we would be missing is the ability to play back tweet events from a specific check-point. One really nice feature about Kafka is that you can keep a consumption point to have data replayed. This makes reindexing for cases such as data corruption and schema changes, etc., possible. Furthermore, to scale search, we would have a growing number of search nodes consume from the same Kafka stream. Kafka is written in a way where adding consumers does not affect through-put of the system really helps in scaling the entire system.
  • Another important design decision was on using Voldemort for storage. One solution would be instead store tweets in the search index, e.g. Lucene stored fields. The benefits with this approach would be stronger consistency between search and store, and also the stored data would follow the retention policy of that’s defined by the search system. However, other than the fact that Lucene stored field is no-where near as optimal comparing to a Voldemort cluster (an implementation issue), there are more convincing reasons:
    • We can first see the consistency benefit for having search and store be together is negligible. Actually, if we follow our assumption of tweets being append-only and we always write to Voldemort first, we really wouldn’t have consistency issues. Yet, having data storage reside on the same search system would disproportionally introduce contention for IO bandwidth and OS cache, as data volume increases, search performance can be negatively impacted.
    • The point about retention is rather valid. As search index guarantees older tweets to be expired, Voldemort store would continue to grow. Our decision ultimately came down to two points: 1) Voldemort’s growth factor is very different, e.g. adding new records into the system is much cheaper, so it is feasible to have a much longer data retention policy. 2) Having have cluster of tweet storage allows us to integrate with other systems if desired for analytics, display etc.

Original title and link: Recipe for a Distributed Realtime Tweet Search System (NoSQL databases © myNoSQL)


Distributed Systems: The Phi Accrual Failure Detector Paper


Detecting failures is a fundamental issue for fault-tolerance in distributed systems. […]

We present a novel abstraction, called accrual failure detectors, that emphasizes flexibility and expressiveness and can serve as a basic building block to implementing failure detectors in distributed systems. Instead of providing information of a boolean nature (trust vs. suspect), accrual failure detectors output a suspicion level on a continuous scale.

The architectural difference between traditional and accrual failure detectors:

Traditional and Accrual Failure Detectors

Credit Naohiro Hayashibara, Xavier Défago Rami Yared, and Takuya Katayama

Original title and link: Distributed Systems: The Phi Accrual Failure Detector Paper (NoSQL databases © myNoSQL)

Riak’s Distributed Storage Architecture

Justin Sheehy[1] talking about the Riak’s internals and how they’ve build Riak as a distributed, robust, scalable, and extensible NoSQL database

Justin Sheehy - Riak’s distributed storage architecture

  1. Justin Sheehy: CTO Basho, @justinsheehy  ()

Original title and link: Riak’s Distributed Storage Architecture (NoSQL databases © myNoSQL)

Redis Cluster Explained

For Redis users, it shouldn’t be a surprise that Salvatore Sanfilippo (@antirez) has been working on Redis clustering lately. In the videos embedded below, Salvatore explains the details of Redis clustering .

Distributed Graph Databases and Usecases

Darren Wood[1] presentation containing a mix of details on distributed graph databases and graph databases use cases:

According to Darren distributed graph databases characteristics are:

  • High performance distributed persistence
  • Ability to deal with remote data reads (fast)
  • Intelligent local cache of subgraphs
  • Distributed navigation processing
  • Distributed, multi-source concurrent ingest
  • Write modes supporting both strict and eventually consistency

You might want to check Marko Rodrigues’ Graph Databases: more than an introduction for a more complete list of possible graph databases use cases.

  1. Darren Wood, Chief Architect InfiniteGraph  ()

Original title and link: Distributed Graph Databases and (NoSQL databases © myNoSQL)

MongoDB: Handling Master Crashes in Master/Slave Setups

Very informative discussion about handling master server crashed in MongoDB master/slave setups:

Alvin Richards: in the case of an unclean shutdown, you must always

  • remove the lock file
  • start mongod with --repair
  • after the repair is complete, then you can restart mongod

The time a repair takes is a function of the amount of data and indexes and the I/O capability of your system. Right now you need access to double the amount of storage as the repair takes place. We will be optimizing this going forward. You can use the --repairpath to indicate an alternate volume if you need to provision space just for the repair operation.

Original title and link: MongoDB: Handling Master Crashes in Master/Slave Setups (NoSQL databases © myNoSQL)


CouchDB: Design for Replication

It is not only about scaling CouchDB, or technological support through BigCouch, or ☞ server topology and replication rules, but also about ☞ data modeling:

Whilst CouchDB includes replication and a conflict-flagging mechanism, this is not the whole story for building an application which replicates in a way which users expect.

Original title and link: CouchDB: Design for Replication (NoSQL databases © myNoSQL)

What is Riak?

From Basho’s blog:

Riak is:

  • A Database
  • A Data Store
  • A key/value store
  • Used by Fortune 100 Companies
  • Used by startups
  • A “NoSQL” database
  • Schemaless and data-type agnostic
  • Written (primarily) in Erlang
  • As distributed as you want and need it to be
  • Scalable
  • Pronounced “REE-ack”
  • Not the best fit for every project and application
  • And much, much more…

So far I’ve heard only about Riak and Mozilla, Riak at, and this atypical Riak usage for a church kiosks, but no mentions of Fortune 100 company names. Anyone knows who are they referring to?

Update: please check the comment thread for more details. It looks like the Fortune 100 company Basho is referring to is Comcast.

Original title and link for this post: What is Riak? (published on the NoSQL blog: myNoSQL)