MegaStore: All content tagged as MegaStore in NoSQL databases and polyglot persistence
Thursday, 6 October 2011
How Does Google MegaStore Compare Against HDFS/HBase?
Alex Feinberg answering the question in the title:
This is like saying “how does a General Motors bus compare against a Ford engine”. MegaStore is built on of Google’s BigTable/GFS. HBase/HDFS are BigTable/HDFS work-alikes.
BigTable and HBase give up availability (in the CAP Theorem sense) in favour of consistency: when a tablet master node (HRegionServer in HBase) goes down, the portion of the keyspace the failed node is responsible for becomes (briefly) unavailable until another node takes over the portion of the key space. This is efficient, as the data/write-ahead-log is stored GFS (or HDFS): in a way serializing writes to GFS/HDFS (a file system with relaxed consistency semantics) through a single node ensures serializable consistency.
Make sure you read it all.
Original title and link: How Does Google MegaStore Compare Against HDFS/HBase? (©myNoSQL)
via: http://www.quora.com/How-does-Google-MegaStore-compare-against-HDFS-HBase
Tuesday, 19 July 2011
Paper: Google Fusion Tables: Data Management, Integration and Collaboration in the Cloud
This paper from Google talks extensively about the usage of BigTable and Megastore, the data model, query processing, and transaction handling in the implementation of Google Fusion Tables.
Google Fusion Tables is a cloud-based service for data management and integration. Fusion Tables enables users to upload tabular data files (spreadsheets, CSV, KML), currently of up to 100MB. The system provides several ways of visualizing the data (e.g., charts, maps, and timelines) and the ability to filter and aggregate the data. It supports the integration of data from multiple sources by performing joins across tables that may belong to different users. […] This paper describes the inner workings of Fusion Tables, including the storage of data in the system and the tight integration with the Google Maps infrastructure.
Download the paper or read it after the break.
Monday, 20 June 2011
Multi-Document Transactions in RavenDB vs Other NoSQL Databases
“We tried using NoSQL, but we are moving to Relational Databases because they are easier…”
This is how Oren Eini starts his post about RavenDB support for multi-document transactions and the lack of it from MongoDB:
- For a single server, we support atomic multi document writes natively. (note that this isn’t the case for Mongo even for a single server).
- For multiple servers, we strongly recommend that your sharding strategy will localize documents, meaning that the actual update is only happening on a single server.
- For multi server, multi document atomic updates, we rely on distributed transactions.
In the NoSQL space, there are a couple of other solutions that support transactions:
- Google Megastore
- Redis has two mechanisms that come close to transactions: MULTI/EXEC/DISCARD and pipelining —this one is exemplified in this Redis based triplestore database implementation
- many of the graph databases (Neo4j, HyperGraphDB, InfoGrid)
If you look at these from the perspective of distributed systems, the only distributed ones that support transactions are Megastore and RavenDB. There’s also VoltDB which is all transactions. Are there any I’ve left out?
Original title and link: Multi-Document Transactions in RavenDB vs Other NoSQL Databases (NoSQL database©myNoSQL)
Monday, 6 June 2011
Google BigTable, MapReduce, MegaStore vs. Hadoop, MongoDB
Dhanji R. Prasanna leaving Google:
Here is something you’ve may have heard but never quite believed before: Google’s vaunted scalable software infrastructure is obsolete. Don’t get me wrong, their hardware and datacenters are the best in the world, and as far as I know, nobody is close to matching it. But the software stack on top of it is 10 years old, aging and designed for building search engines and crawlers. And it is well and truly obsolete.
Protocol Buffers, BigTable and MapReduce are ancient, creaking dinosaurs compared to MessagePack, JSON, and Hadoop. And new projects like GWT, Closure and MegaStore are sluggish, overengineered Leviathans compared to fast, elegant tools like jQuery and mongoDB. Designed by engineers in a vacuum, rather than by developers who have need of tools.
. Or maybe it is true. Or maybe it is just another magic triangle:
Edward Ribeiro mentioned a post from another ex-Googler which points out similar issues with Google’s philosophy.
Original title and link: Google BigTable, MapReduce, MegaStore vs. Hadoop, MongoDB (NoSQL databases © myNoSQL)
Thursday, 26 May 2011
MongoDB and Google Megastore
TheRegister quoting Dwight Merriman, 10gen founder, in a post titled “MongoDB daddy: My baby beats Google BigTable”:
We read [Google’s Megastore research paper] and we were almost laughing at the similarities
I hope both the title and the quote are not really Dwight’s.
Original title and link: MongoDB and Google Megastore (NoSQL databases © myNoSQL)
via: http://www.theregister.co.uk/2011/05/25/the_once_and_future_mongodb/
Monday, 21 February 2011
Amazon SimpleDB, Google Megastore & CAP
Nati Shalom (Gigaspaces) pulls out a couple of references from James Hamilton’s posts[1] on Amazon SimpleDB and Google Megastore consistency model concluding:
It is interesting to see that the reality is that even Google and Amazon - which I would consider the extreme cases for big data - realized the limitation behind eventual consistency and came up with models that can deal with scaling without forcing a compromise on consistency as I also noted in one of my recent NoCAP series
But he lefts out small details like these:
Update rates within a entity group are seriously limited by:
- When there is log contention, one wins and the rest fail and must be retried
- Paxos only accepts a very limited update rate (order 10^2 updates per second)
and
Cross entity group updates are supported by:
- two-phase commit with the fragility that it brings
- queueing ans asynchronously applying the changes
Original title and link: Amazon SimpleDB, Google Megastore & CAP (NoSQL databases © myNoSQL)
Monday, 10 January 2011
Google Megastore Paper Summarized
In case you didn’t read the Google Megastore paper[1]
, James Hamilton has published his notes on the paper:
Overall, an excellent paper with lots of detail on a nicely executed storage system. Supporting consistent read and full ACID update semantics is impressive although the limitation of not being able to update an entity group at more than a “few per second” is limiting.
Original title and link: Google Megastore Paper Summarized (NoSQL databases © myNoSQL)
via: http://perspectives.mvdirona.com/2011/01/09/GoogleMegastoreTheDataEngineBehindGAE.aspx
Friday, 7 January 2011
Google Megastore: Scalable, Highly Available Storage for Interactive Services
A new paper from Google:
Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability.
We provide fully serializable ACID semantics within fine-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters.
This paper describes Megastore’s semantics and replication algorithm.
Megastore seems to be the solution behind the Google App Engine high replication datastore.
Emphases are mine.
Original title and link: Google Megastore: Scalable, Highly Available Storage for Interactive Services (NoSQL databases © myNoSQL)
via: http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
