NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Cassandra 101 for System Administrators with Nathan Milford - Powered by NoSQL

While today was supposed to be a new educational video from the Cassandra NYC 2011 video series, I thought that learning from the lessons of operating Cassandra at Outbrain to serve over 30 billion impressions monthly will be quite educational.

Before watching the video and checking the slides, here are some interesting bits:

  • Outbrain operates a 14 node Cassandra cluster deployed in two data centers
  • each node stores around 70-80G for a total of aprox.550G (unreplicated)
  • Outbrain’s stack is based on Tomcat - Memcached - CacheWarmer - Cassandra - Hadoop/Hive. Data gets into Hadoop using Flume and then it is analyzed and results are pushed into Cassandra
  • this sounds comforting for the ops people:

    In my experience, once the cluster has been setup there is not much else to do other than occasional tuning as you learn how your data behaves - talking about tuning the following must be monitored: - heap size and usage - garbage collections - IO wait - writes through RowMutationStage (active and pending) - CompactionStage - Compaction cound - Cache hit rate - ReadStage (active and pending)

For watching more videos from this event follow the Cassandra NYC 2011 tag.

Original title and link: Cassandra 101 for System Administrators with Nathan Milford - Powered by NoSQL (NoSQL database©myNoSQL)