NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Why Netflix Picked Amazon SimpleDB, Hadoop/HBase, and Cassandra

Yury Izrailevsky[1]:

The reason why we use multiple NoSQL solutions is because each one is best suited for a specific set of use cases. For example, HBase is naturally integrated with the Hadoop platform, whereas Cassandra is best for cross-regional deployments and scaling with no single points of failure. Adopting the non-relational model in general is not easy, and Netflix has been paying a steep pioneer tax while integrating these rapidly evolving and still maturing NoSQL products. There is a learning curve and an operational overhead. Still, the scalability, availability and performance advantages of the NoSQL persistence model are evident and are paying for themselves already, and will be central to our long-term cloud strategy.

Summarizing the pros for each of the 3 solutions:

  • Amazon SimpleDB Pros

    • highly durable, writes spanning multiple availability zones
    • handy query and data formats
    • batch operations
    • consistent reads
    • hosted solution
  • HBase Pros

    • dynamic partitioning model
    • built-in support for compression
    • range queries
    • support for distributed counters
    • strong consistency
    • interoperability with Hadoop
  • Cassandra Pros

    • no dedicated name nodes
    • no practical architectural limitations on data sizes, row/column counts, etc.
    • flexible data model
    • no underlying storage format requirements like HDFS
    • uniquely flexible consistency and replication models
    • cross-datacenter and cross-regional replication

I hope the next post will be about the “small” issues Netflix ran into when adopting each of these systems. In the past they’ve shared some of the challenges of an Oracle - Amazon SimpleDB hybrid solution.

  1. Yury Izrailevsky: Netflix Director of Cloud and Systems Infrastructure  

Original title and link: Why Netflix Picked Amazon SimpleDB, Hadoop/HBase, and Cassandra (NoSQL databases © myNoSQL)