While today was supposed to be a new educational video from the Cassandra NYC 2011 video series, I thought that learning from the lessons of operating Cassandra at Outbrain to serve over 30 billion impressions monthly will be quite educational.
Before watching the video and checking the slides, here are some interesting bits:
- Outbrain operates a 14 node Cassandra cluster deployed in two data centers
- each node stores around 70-80G for a total of aprox.550G (unreplicated)
- Outbrain’s stack is based on Tomcat - Memcached - CacheWarmer - Cassandra - Hadoop/Hive. Data gets into Hadoop using Flume and then it is analyzed and results are pushed into Cassandra
this sounds comforting for the ops people:
In my experience, once the cluster has been setup there is not much else to do other than occasional tuning as you learn how your data behaves - talking about tuning the following must be monitored: - heap size and usage - garbage collections - IO wait - writes through RowMutationStage (active and pending) - CompactionStage - Compaction cound - Cache hit rate - ReadStage (active and pending)
For watching more videos from this event follow the Cassandra NYC 2011 tag.
Original title and link: Cassandra 101 for System Administrators with Nathan Milford - Powered by NoSQL ( ©myNoSQL)