ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Medialets: All content tagged as Medialets in NoSQL databases and polyglot persistence

Cassandra as the Central Nervous System of Your Distributed Systems with Joe Stein - Powered by NoSQL

In the 4th week of the DataStax’s Cassandra NYC 2011 video series, we have Joe Stein from Medialets talking about the architecture

Before diving into the video here are some interesting data points:

  • Medialets serves rich media ads
    • they handle 3-4TB of daily data
    • microsecond-level response times
  • Cassandra is used for time series and aggregate metrics
  • all MapReduce jobs written in Python. This reminded me of the recent post about the performance impact of operations in Hadoop Map phase
  • Medialets architecture:

    Medialets architecture

  • Major components of the Medialets’s architecture:

    • Kafka
    • MySQL
    • Cassandra: 6 node cluster, 100k requests, single DC
    • Hadoop
    • ZooKeeper: coordinates all the services on the platform
  • some of the data in MySQL is replicated in Cassandra (and coordinated with ZooKeeper)
  • data is fed back to MySQL
  • Kafka for collecting analytics data:
    • aggregates go into Cassandra
    • events in Hadoop
  • GROUP BY with Cassandra
  • for real-time systems aggregations must be done upfront
  • the way data is segmented is critical
  • aggregation leads to data explosion