NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Membase: All content tagged as Membase in NoSQL databases and polyglot persistence

Polyglot persistence at Pinterest: Redis, Membase, MySQL

Pinterest architecture

I’ve created the diagram above based on this very brief answer on Quora:

We use python + heavily-modified Django at the application layer.  Tornado and (very selectively) node.js as web-servers.  Memcached and membase / redis for object- and logical-caching, respectively.  RabbitMQ as a message queue.  Nginx, HAproxy and Varnish for static-delivery and load-balancing.  Persistent data storage using MySQL.  MrJob on EMR for map-reduce.

Data from October 2011 showed Pinterest having over 3 million users generating 400+ million pageviews. There are plently of questions to be answered though:

  1. what is node.js used for? what is RabbitMQ used for?

    Note: the whole section in the diagram about node.js and RabbitMQ is speculative.

  2. is Amazon Elastic MapReduce used for clickstream analysis only (log based analysis) or more than that?

  3. how is data loaded in the Amazon cloud?

    Note: if Amazon Elastic MapReduce is used only for analyzing logs, these are probably uploaded regularly on Amazon S3.

  4. why the need for both Redis and Membase?

Original title and link: Polyglot persistence at Pinterest: Redis, Membase, MySQL (NoSQL database©myNoSQL)

The Couchbase Genealogy

Looks like Matthew Aslett (the451group) had his own version of the Couchbase genealogy:

Couchbase genealogy

Credit Matt Aslett .

Original title and link: The Couchbase Genealogy (NoSQL database©myNoSQL)

History of Couch Projects

Just in case you thought someone made up the whole thing about the status of CouchDB being confusing:

History of Couch Projects

Found in Koji Kawamura‘s Introduction of CouchDB JP slides .

On the other hand I’m still trying to figure out if things in CouchDB land were more confusing than the various Hadoop versions out there. If you compare the two genealogy trees you’ll notice a reversed pattern.

Original title and link: History of Couch Projects (NoSQL database©myNoSQL)

Couchbase Server 1.8 Released, Rebranding and Some Improvements in Cluster Rebalancing

Couchbase Server 1.8 replaces Membase Server 1.7 as our “flagship” database offering. In addition to the obvious rebranding, we’ve made substantial improvements in the cluster rebalancing process and fixed a number of nagging issues in 1.7.

In case you feel lost with which Couchbase products are which, read my 5 bullet points explanation.

Original title and link: Couchbase Server 1.8 Released, Rebranding and Some Improvements in Cluster Rebalancing (NoSQL database©myNoSQL)


Couchbase: Clarifying Confusions in 5 Bullet Points

Here are the 5 bullet points that would helped Couchbase clarify all the confusion about Couchbase, Membase, CouchDB:

  1. We are working on Couchbase server 2.0. This is our next major release and the only product we will be focusing next. It represents the continuation of our current Membase server product.
  2. Until Couchbase server 2.0 is out, we might release one or two updates to our Membase server that are addressing the most important issues.
  3. We will provide a migration path to users of Membase server to Couchbase server 2.0
  4. We will not support anymore our distribution of CouchDB known as Couchbase Single Server. Damien Katz, creator of CouchDB, has decided to step away from the Apache CouchDB project and focus on Couchbase development.
  5. Due to the major changes in Couchbase server 2.0, we will not offer a migration path for the users of Couchbase Single Server to Couchbase server 2.0.

Original title and link: Couchbase: Clarifying Confusions in 5 Bullet Points (NoSQL database©myNoSQL)

Unintentional Market Confusion... Membase, CouchDB, or Couchbase

Not everything went as we hoped or expected, however. Unfortunately, we confused the heck out of many of our potential users. In addition to Membase Server and our new mobile products we also offered Couchbase Single Server which was a packaged “distribution” of Apache CouchDB. On top of that we began releasing developer previews of Couchbase Server 2.0, which incorporated CouchDB technology into Membase Server – but this product was not compatible with Couchbase Single Server (or CouchDB). If you are confused just reading this you get the point – and so do we.


Original title and link: Unintentional Market Confusion… Membase, CouchDB, or Couchbase (NoSQL database©myNoSQL)


Migrating a Membase Cluster

Shawn Chiao documents the migration of a 8 nodes Membase cluster storing 240mil. key-value pairs for a total of 160GB—part 1 and part 2:

After up all night babysitting the rebalance process, I am happy to report that it was a rather uneventful night of maintenance.  The rebalance itself took 8-9 hours to complete, and then took another hour for all the replicas to get saved to the disk also.  Theoretically, I didn’t need to take the site down while the rebalance was happening, but I took the game down just to be safe and not compromise the game experience.

Question is if the application was stopped, wasn’t there any other migration approach that would reduce the time window for completing the migration?

What I’m thinking of is that if there are no new writes to the system then one could:

  1. add the new nodes as “slaves” for existing nodes (also change the replication factor)
  2. once these have caught up, change the master to one of the new nodes
  3. kill old nodes

This would basically avoid reshuffling the data across the cluster.

Another thing that causes this warm-up to take a long time is the fact that membase uses sqlite3 engine for persisting data to the disk.  Sqlite3 uses btree to store its data, and when items are deleted, the underlying btree pages are merely marked as “free”.  Later on when new items are stored, their content can be spread over different pages, causing fragmentation.  So if the membase cluster is seeing a lot of delete or expiration, which ours does, the warm-up time will slowly increase overtime.  This fragmentation issue will be addressed in the next major release Couchbase 2.0, since it will be replacing sqlite3 with CouchDB.  But in the mean time, this is a real problem that we will need to deal with in production.


  1. is Membase using 1 sqlite3 engine per node or per bucket?
  2. isn’t sqlite3 single threaded thus making all writes and reads sequential?

Original title and link: Migrating a Membase Cluster (NoSQL database©myNoSQL)

Membase Cluster on EC2 or Amazon ElastiCache?

While there are some advatanges for using a Membase cluster on EC2 instead of an ephemeral Memcached-based service like Amazon ElastiCache, one question remains: self-managed vs managed? Answering it is essential to undertand the final OPEX.

Advantages of using a Membase cluster instead of ElastiCache:

  • persistent vs ephemeral data
  • backup & restore
  • SASL authentication
  • using reserved instances
  • cluster elasticity with automatic rebalancing and no need to cache warming

When calculating the OPEX for each of these solutions, one would need to account for:

  • licensing fees [1]
  • monitoring, maintenance, repairs
  • salary and wages

In terms of service fees here is a quick comparison:

Membase Amazon EC2 vs Amazon ElastiCache

  1. Membase has both a Community and Enterprise editions  

Original title and link: Membase Cluster on EC2 or Amazon ElastiCache? (NoSQL database©myNoSQL)


How to Cache PHP Sessions in Membase

Why Membase is the next step after Memcached:

Memcache is great, but once you start running low on memory (as you cache more info) lesser-used items in the cache will be destroyed to free up more space for new items. This can result in users getting logged out.  Also, if one of the servers in the pool fails or gets rebooted, all the data it was holding is lost, and then the cache must get “warmed up” again.

Membase is memcache with data persistence. The improvement of having data persistence is that if you need to bring down a server, you don’t have to worry about all that dainty, floaty data in memory that is gonna get burned. Since membase has replication built-in, you can feel free to restart a troublesome server with fear of your database getting pounded as the caches need to refill, or that a set of unlucky users will get logged out.  I’ll let you read about all the many other advantages of membase here.  It’s much more than I’ve mentioned here.

Original title and link: How to Cache PHP Sessions in Membase (NoSQL database©myNoSQL)


From Memcached to Membase Memcached Buckets


But this post isn’t about switching from a volatile cache to a persistent solution. It is about removing the dumb part from the memcached setup.

So I thought I’ll read about the advantages of virtual nodes/buckets and elastic clusters, cold vs warm caches, cluster recoverability, the widely used memcached protocol and the possibility to use extensions in future versions, etc. Instead I’ve learned about Moxy-based cluster configuration discoverability and how stupid the memcached PHP libraries are.

But I really enjoyed Matt Ingenthron’ quote:

at Membase Inc they view Memcached as a rabbit. It is fast, but it is pretty dumb and procreates quickly. Before you know it, it will be running wild all over your system.

Original title and link: From Memcached to Membase Memcached Buckets (NoSQL database©myNoSQL)


Use Membase and You'll Never Want to Mess With Memcached Servers Again

All I can say is WOW. I’ll never use stand alone memcached server(s) again.

  • crazy easy to install and make a cluster.

  • 0 changes to your app code. Operates seamlessly with memcached protocol. If you want to take advantage of advanced features, you need to modify app code.

  • you can dynamically add and remove nodes without losing all your keys/data.

  • 2 bucket types:

    1. Membase: supports data persistence (writes them ionicely to disk) and replication (one node dies, you dont lose your key/value pairs). It sends data to disk as fast as it can (while giving priority to getting data back from disk). This is done asynchronously (with an option for synchronous), so clients shouldn’t be able to perceive a difference between Membase and memcached data buckets.
    2. Memcached: no persistence or replication. all in memory. I would highly recomend going membase bucket unless you have some I/O concerns (like you get charged for I/O in the cloud).
  • Awesome admin web UI.

  • lots of documentation

  • helpful community

The only concern I could think one would have to replace memached with Membase is the maturity of the cluster solution. But on this front, things will only get better, probably before memcached will get an auto-scaling solution.

Original title and link: Use Membase and You’ll Never Want to Mess With Memcached Servers Again (NoSQL database©myNoSQL)


Zynga, Data Centers, Polyglot Persistence, and Big Data

Cadir Lee (CTO Zynga) quoted in a VentureBeat post:

It’s not the amount of hardware that matters. It’s the architecture of the application. You have to work at making your app architecture so that it takes advantage of Amazon. You have to have complete fluidity with the storage tier, the web tier. We are running our own data centers. We are looking more at doing our own data centers with more of a private cloud.

Couple of thoughts:

  1. Zynga is going the opposite direction than Netflix. While Netflix is focusing (by using Amazon for most of their infrastructure), Zynga is diversifying (building their own data centers) .
  2. Zynga’s applications are great examples of where fully distributed NoSQL databases fit. Availability is key.
  3. My answer to the question: “how many Zyngas are out there” would be: “enough to ensure some good business for the most reliable and scalable distributed databases”
  4. Zynga has contributed and is an investor in Membase, the company that merged with CouchOne to form Couchbase. But Zynga was using a custom version of Membase.
  5. Zynga also operates a large MySQL cluster.
  6. Zynga processes over 15 terabytes of game data every day (according to their SEC filing ). That’s Hadoop sweet spot.

PS: I’d love to talk to someone from Zynga about their data storage approach. If you have any connections I’d really appreciate an introduction.

Original title and link: Zynga, Data Centers, Polyglot Persistence, and Big Data (NoSQL database©myNoSQL)