NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MegaStore: All content tagged as MegaStore in NoSQL databases and polyglot persistence

How Does Google MegaStore Compare Against HDFS/HBase?

Alex Feinberg answering the question in the title:

This is like saying “how does a General Motors bus compare against a Ford engine”. MegaStore is built on of Google’s BigTable/GFS. HBase/HDFS are BigTable/HDFS work-alikes.

BigTable and HBase give up availability (in the CAP Theorem sense) in favour of consistency: when a tablet master node (HRegionServer in HBase) goes down, the portion of the keyspace the failed node is responsible for becomes (briefly) unavailable until another node takes over the portion of the key space. This is efficient, as the data/write-ahead-log is stored GFS (or HDFS): in a way serializing writes to GFS/HDFS (a file system with relaxed consistency semantics) through a single node ensures serializable consistency.

Make sure you read it all.

Original title and link: How Does Google MegaStore Compare Against HDFS/HBase? (NoSQL database©myNoSQL)


Paper: Google Fusion Tables: Data Management, Integration and Collaboration in the Cloud

This paper from Google talks extensively about the usage of BigTable and Megastore, the data model, query processing, and transaction handling in the implementation of Google Fusion Tables.

Google Fusion Tables is a cloud-based service for data management and integration. Fusion Tables enables users to upload tabular data files (spreadsheets, CSV, KML), currently of up to 100MB. The system provides several ways of visualizing the data (e.g., charts, maps, and timelines) and the ability to filter and aggregate the data. It supports the integration of data from multiple sources by performing joins across tables that may belong to different users. […] This paper describes the inner workings of Fusion Tables, including the storage of data in the system and the tight integration with the Google Maps infrastructure.

Download the paper or read it after the break.

Multi-Document Transactions in RavenDB vs Other NoSQL Databases

“We tried using NoSQL, but we are moving to Relational Databases because they are easier…”

This is how Oren Eini starts his post about RavenDB support for multi-document transactions and the lack of it from MongoDB:

  1. For a single server, we support atomic multi document writes natively. (note that this isn’t the case for Mongo even for a single server).
  2. For multiple servers, we strongly recommend that your sharding strategy will localize documents, meaning that the actual update is only happening on a single server.
  3. For multi server, multi document atomic updates, we rely on distributed transactions.

In the NoSQL space, there are a couple of other solutions that support transactions:

If you look at these from the perspective of distributed systems, the only distributed ones that support transactions are Megastore and RavenDB. There’s also VoltDB which is all transactions. Are there any I’ve left out?

Original title and link: Multi-Document Transactions in RavenDB vs Other NoSQL Databases (NoSQL database©myNoSQL)

Google BigTable, MapReduce, MegaStore vs. Hadoop, MongoDB

Dhanji R. Prasanna leaving Google:

Here is something you’ve may have heard but never quite believed before: Google’s vaunted scalable software infrastructure is obsolete. Don’t get me wrong, their hardware and datacenters are the best in the world, and as far as I know, nobody is close to matching it. But the software stack on top of it is 10 years old, aging and designed for building search engines and crawlers. And it is well and truly obsolete.

Protocol Buffers, BigTable and MapReduce are ancient, creaking dinosaurs compared to MessagePack, JSON, and Hadoop. And new projects like GWT, Closure and MegaStore are sluggish, overengineered Leviathans compared to fast, elegant tools like jQuery and mongoDB. Designed by engineers in a vacuum, rather than by developers who have need of tools.

Maybe it is just the disappointment of someone whose main project was killed

. Or maybe it is true. Or maybe it is just another magic triangle:

Agility Scalability Coolness factor Triangle

Edward Ribeiro mentioned a post from another ex-Googler which points out similar issues with Google’s philosophy.

Original title and link: Google BigTable, MapReduce, MegaStore vs. Hadoop, MongoDB (NoSQL databases © myNoSQL)


MongoDB and Google Megastore

TheRegister quoting Dwight Merriman, 10gen founder, in a post titled “MongoDB daddy: My baby beats Google BigTable”:

We read [Google’s Megastore research paper] and we were almost laughing at the similarities

I hope both the title and the quote are not really Dwight’s.

Original title and link: MongoDB and Google Megastore (NoSQL databases © myNoSQL)


Amazon SimpleDB, Google Megastore & CAP

Nati Shalom (Gigaspaces) pulls out a couple of references from James Hamilton’s posts[1] on Amazon SimpleDB and Google Megastore consistency model concluding:

It is interesting to see that the reality is that even Google and Amazon - which I would consider the extreme cases for big data - realized the limitation behind eventual consistency and came up with models that can deal with scaling without forcing a compromise on consistency as I also noted in one of my recent NoCAP series

But he lefts out small details like these:

Update rates within a entity group are seriously limited by:

  • When there is log contention, one wins and the rest fail and must be retried
  • Paxos only accepts a very limited update rate (order 10^2 updates per second)


Cross entity group updates are supported by:

  • two-phase commit with the fragility that it brings
  • queueing ans asynchronously applying the changes

Original title and link: Amazon SimpleDB, Google Megastore & CAP (NoSQL databases © myNoSQL)


Google Megastore Paper Summarized

In case you didn’t read the Google Megastore paper[1], James Hamilton has published his notes on the paper:

Overall, an excellent paper with lots of detail on a nicely executed storage system. Supporting consistent read and full ACID update semantics is impressive although the limitation of not being able to update an entity group at more than a “few per second” is limiting.

Original title and link: Google Megastore Paper Summarized (NoSQL databases © myNoSQL)


Google Megastore: Scalable, Highly Available Storage for Interactive Services

A new paper from Google:

Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability.
We provide fully serializable ACID semantics within fine-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters.
This paper describes Megastore’s semantics and replication algorithm.

Megastore seems to be the solution behind the Google App Engine high replication datastore.

Emphases are mine.

Original title and link: Google Megastore: Scalable, Highly Available Storage for Interactive Services (NoSQL databases © myNoSQL)