NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Google: All content tagged as Google in NoSQL databases and polyglot persistence

The NoSQL Family Tree


Even if it includes just a handful of NoSQL databases, it’s still a nice visualization.

Original title and link: The NoSQL Family Tree (NoSQL database©myNoSQL)


F1 and Spanner: A Distributed SQL Database That Scales

A very good summary of the goals, interactions and collaboration between F1 and Spanner by Srihari Srinivasan:

With both the F1 and Spanner papers out its now possible to understand their interplay a bit holistically.

Original title and link: F1 and Spanner: A Distributed SQL Database That Scales (NoSQL database©myNoSQL)


Google moves from MySQL to MariaDB

Jack Clark for TheRegister quoting Google senior systems engineer, Jeremy Cole’s talk at XLDB:

“Were running primarily on [MySQL] 5.1 which is a little outdated, and so we’re moving to MariaDB 10.0 at the moment,”

I’m wondering how much of this decision is technical and how much is political. While Jack Clark’s points to the previous “disagreements” between Google and Oracle, when I say political decisions I mean more than this: access to the various bits of the code (e.g. tests, security issues), control over the future of the product, etc.

Original title and link: Google moves from MySQL to MariaDB (NoSQL database©myNoSQL)


Google sniffs at MySQL fork MariaDB: Yum. Have an engineer

The Chocolate Factory has sent an engineer to the MariaDB Foundation, which looks after the fork’s codebase, community and ecosystem, and has MySQL daddy Monty Widenius himself as its lead developer.

That’s no validation. Just keeping an eye on an alternative for a piece of technology you are heavily using1.

  1. Google, Facebook, and probably quite a few others have entire internal teams dedicated to MySQL.  

Original title and link: Google sniffs at MySQL fork MariaDB: Yum. Have an engineer (NoSQL database©myNoSQL)


Introducing Google Cloud Datastore

Urs Hölzle in a post summarizing some of the announcements at Google I/O:

Google Cloud Datastore is a fully managed and schemaless solution for storing non-relational data. Based on the popular App Engine High Replication Datastore, Cloud Datastore is a standalone service that features automatic scalability and high availability while still providing powerful capabilities such as ACID transactions, SQL-like queries, indexes and more.

I’m heading over to the project’s site to read more.

Original title and link: Introducing Google Cloud Datastore (NoSQL database©myNoSQL)


Amazon Preparing 'Disruptive' Big Data AWS Service?

Interesting speculation by The Register:

AWS already has the AWS Data Pipeline, which helps administrators schedule and shuttle data among various services, AWS Redshift for data warehousing which lets people store large quantities of data in the cloud and run queries on it, its NoSQL SSD-backed DynamoDB, and its Relational Database Service (RDS). So where does MADS fit?

The Reg’s take is that MADS will allow Amazon to build services that can net together the above components and help automate the passing of data among them. It may also become a standalone product in its own right, based on its similarities to the TransLattice and Google Spanner tech.

I almost never bet, but I’d say this could be Amazon’s Spanner.

Original title and link: Amazon Preparing ‘Disruptive’ Big Data AWS Service? (NoSQL database©myNoSQL)


Overview of Dremel-Like Solutions: Moving Beyond Hadoop for Big Data Needs

Until I learn more about the recently announced Cloudera Impala and Druid from Metamarkets, this article by Jaikumar Vijayan should offer—with some inherent mistakes1—a good overview of the solutions aiming to offer alternatives to the batch-processing nature of Hadoop:

  • Google Dremel (BigQuery)
  • Cloudera Impala
  • Metamarkets Druid
  • Nodeable StreamReduce
  • SAP HANA integrated with Hadoop, etc.

  1. Just an example: “If you can stand latencies of a few seconds, Hadoop is fine. But Hadoop MapReduce is never going to be useful for sub-second latencies”. Then “The technology [nb Google Dremel] can run queries over trillion-row data tables in seconds…”

    Maybe just one more: consider the title “Moving beyond Hadoop” and then the quote from Google’s Ju-kay Kwek: “Google uses Dremel in conjuction with MapReduce. […] Hadoop and Dremel are distributed computing technologies, but each was built to address very different problems.” 

Original title and link: Overview of Dremel-Like Solutions: Moving Beyond Hadoop for Big Data Needs (NoSQL database©myNoSQL)


Google BigQuery Adds Support for JSON Import and Hierarchical Data

Besides performance and quota changes, Google BigQuery adds support for importing JSON data and nested/repeated fields:

If you’re using App Engine Datastore or other NoSQL databases, it’s likely you’re taking advantage of nested and repeated data in your data model. For example, a customer data entity might have multiple accounts, each storing a list of invoices. Now, instead of having to flatten that data, you can keep your data in a hierarchical format when you import to BigQuery.

Original title and link: Google BigQuery Adds Support for JSON Import and Hierarchical Data (NoSQL database©myNoSQL)


Todd Hoff on Google Spanner's

Todd Hoff of

What struck me most in the paper was a deeply buried section essentially describing Google’s motivation for shifting away from NoSQL and to NewSQL. The money quote:

We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.

That’s one piece of the Spanner paper that’s catching everyone’s attention. I’m wondering how much of this reference to transactions refers to:

  1. multi-operations transactions
  2. synchronous replication
  3. data strong consistency

Original title and link: Todd Hoff on Google Spanner’s (NoSQL database©myNoSQL)


Building Spanner Presentation

Alex Lloyd’s talk from Berlin Buzzwords 2012 about Google’s Spanner:

Cloudant's Mike Miller on Google Spanner

Cloudant’s Mike Miller sharing his thoughts about Google’s Spanner paper:

Spanner’s key innovation is around time. It includes a novel system using GPS and Atomic Clocks to distribute a globally synchronized “proper time.” The previous dogma in distributed systems was that synchronizing time within and between datacenters is insurmountably hard and uncertain. Ergo, serialization of requests is impossible at global scale. Google’s key innovation is to accept uncertainty, keep it small (via atomic clocks and GPS), quantify the uncertainty and operate around it. In retrospect this is obvious, but it doesn’t make it any less brilliant.

Original title and link: Cloudant’s Mike Miller on Google Spanner (NoSQL database©myNoSQL)


Google Cloud Platform Is the Biggest Deal in IT Since Amazon Launched EC2

Remember what I was writing in the state of Hadoop market about having a second option for on-demand cloud-based Hadoop services? Benjamin Black compares Google Cloud Platform with Amazon services:

  • Cloud Engine is a lot like EC2 & EBS
  • Cloud Storage is a lot like S3
  • Cloud SQL is a lot like RDS
  • Analytics can be used like CloudWatch (and I know of people putting billions of their own data points in Analytics)
  • BigQuery has no AWS equivalent, but maybe you could build it with EMR?
  • PageSpeed has no AWS equivalent

Hadoop and MapR are already listed as possible use cases for Google Cloud Platform.

I don’t think I could write a better conclusion than Black did in his post:

This is big, planetary scale infrastructure. This is cloud legitimized and super-sized. In the words of the prophet: Shit just got real.

Original title and link: Google Cloud Platform Is the Biggest Deal in IT Since Amazon Launched EC2 (NoSQL database©myNoSQL)