NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Presentation: Cassandra in Production @ Digg - Arin Sarkissian

It looks like the Digg guys are the most public about their usage of Cassandra. Arin’s presentation below is a bit less technical than the ☞ article published a while back, but also has some nice additions.

My notes:

  • how it is to use an alpha-stage project that you don’t have any idea how others are using
  • the problem with sharding is that there’s no standard way to doing it
  • if you start giving away features in your RDBMS why not also looking at alternatives?
  • why Cassandra:

  • easy administration (nb at least the promise of)
  • no SPF
  • more flexible than key-value stores
  • loading data: MySQL -> Hadoop -> Cassandra

    This sounds like a complex process. Arin is mentioning the use of Scribe at Digg and I was wondering if using Scribe to directly get data into Cassandra wouldn’t have been more easier. Anyway it’s difficult to say without knowing the details

  • 12 servers initially, backed down to 8, 3TB of data
  • Performance: < 1ms writes, ~4-5ms reads (nb: these are the numbers from the slides, but I find them odd)
  • I would have preferred to implement this service layer in Java as managing resources/pooling would have been better
  • No, we don’t hate SQL
  • open sourced Python library for Cassandra lazyboy ☞
Digg new architecture