membase: All content tagged as membase in NoSQL databases and polyglot persistence
I’ve created the diagram above based on this very brief answer on Quora:
We use python + heavily-modified Django at the application layer. Tornado and (very selectively) node.js as web-servers. Memcached and membase / redis for object- and logical-caching, respectively. RabbitMQ as a message queue. Nginx, HAproxy and Varnish for static-delivery and load-balancing. Persistent data storage using MySQL. MrJob on EMR for map-reduce.
Data from October 2011 showed Pinterest having over 3 million users generating 400+ million pageviews. There are plently of questions to be answered though:
what is node.js used for? what is RabbitMQ used for?
Note: the whole section in the diagram about node.js and RabbitMQ is speculative.
is Amazon Elastic MapReduce used for clickstream analysis only (log based analysis) or more than that?
how is data loaded in the Amazon cloud?
Note: if Amazon Elastic MapReduce is used only for analyzing logs, these are probably uploaded regularly on Amazon S3.
why the need for both Redis and Membase?
Original title and link: Polyglot persistence at Pinterest: Redis, Membase, MySQL ( ©myNoSQL)
Just in case you thought someone made up the whole thing about the status of CouchDB being confusing:
On the other hand I’m still trying to figure out if things in CouchDB land were more confusing than the various Hadoop versions out there. If you compare the two genealogy trees you’ll notice a reversed pattern.
Original title and link: History of Couch Projects ( ©myNoSQL)
Here are the 5 bullet points that would helped Couchbase clarify all the confusion about Couchbase, Membase, CouchDB:
- We are working on Couchbase server 2.0. This is our next major release and the only product we will be focusing next. It represents the continuation of our current Membase server product.
- Until Couchbase server 2.0 is out, we might release one or two updates to our Membase server that are addressing the most important issues.
- We will provide a migration path to users of Membase server to Couchbase server 2.0
- We will not support anymore our distribution of CouchDB known as Couchbase Single Server. Damien Katz, creator of CouchDB, has decided to step away from the Apache CouchDB project and focus on Couchbase development.
- Due to the major changes in Couchbase server 2.0, we will not offer a migration path for the users of Couchbase Single Server to Couchbase server 2.0.
Original title and link: Couchbase: Clarifying Confusions in 5 Bullet Points ( ©myNoSQL)
After up all night babysitting the rebalance process, I am happy to report that it was a rather uneventful night of maintenance. The rebalance itself took 8-9 hours to complete, and then took another hour for all the replicas to get saved to the disk also. Theoretically, I didn’t need to take the site down while the rebalance was happening, but I took the game down just to be safe and not compromise the game experience.
Question is if the application was stopped, wasn’t there any other migration approach that would reduce the time window for completing the migration?
What I’m thinking of is that if there are no new writes to the system then one could:
- add the new nodes as “slaves” for existing nodes (also change the replication factor)
- once these have caught up, change the master to one of the new nodes
- kill old nodes
This would basically avoid reshuffling the data across the cluster.
Another thing that causes this warm-up to take a long time is the fact that membase uses sqlite3 engine for persisting data to the disk. Sqlite3 uses btree to store its data, and when items are deleted, the underlying btree pages are merely marked as “free”. Later on when new items are stored, their content can be spread over different pages, causing fragmentation. So if the membase cluster is seeing a lot of delete or expiration, which ours does, the warm-up time will slowly increase overtime. This fragmentation issue will be addressed in the next major release Couchbase 2.0, since it will be replacing sqlite3 with CouchDB. But in the mean time, this is a real problem that we will need to deal with in production.
- is Membase using 1 sqlite3 engine per node or per bucket?
- isn’t sqlite3 single threaded thus making all writes and reads sequential?
Original title and link: Migrating a Membase Cluster ( ©myNoSQL)
Cadir Lee (CTO Zynga) quoted in a VentureBeat post:
It’s not the amount of hardware that matters. It’s the architecture of the application. You have to work at making your app architecture so that it takes advantage of Amazon. You have to have complete fluidity with the storage tier, the web tier. We are running our own data centers. We are looking more at doing our own data centers with more of a private cloud.
Couple of thoughts:
- Zynga is going the opposite direction than Netflix. While Netflix is focusing (by using Amazon for most of their infrastructure), Zynga is diversifying (building their own data centers) .
- Zynga’s applications are great examples of where fully distributed NoSQL databases fit. Availability is key.
- My answer to the question: “how many Zyngas are out there” would be: “enough to ensure some good business for the most reliable and scalable distributed databases”
- Zynga has contributed and is an investor in Membase, the company that merged with CouchOne to form Couchbase. But Zynga was using a custom version of Membase.
- Zynga also operates a large MySQL cluster.
- Zynga processes over 15 terabytes of game data every day (according to their SEC filing ). That’s Hadoop sweet spot.
PS: I’d love to talk to someone from Zynga about their data storage approach. If you have any connections I’d really appreciate an introduction.
Original title and link: Zynga, Data Centers, Polyglot Persistence, and Big Data ( ©myNoSQL)