ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

How Digg is Built? Using a Bunch of NoSQL technologies

The picture should speak for Digg’s polyglot persistency approach:

Digg Data Storage Architecture

But here is also a description of the data stores in use:

Digg stores data in multiple types system depending on the type of data and the access patterns, and also for historical reasons in some cases :)

  • Cassandra: The primary store for “Object-like” access patterns for such things as Items (stories), Users, Diggs and the indexes that surround them. Since the Cassandra 0.6 version we use does not support secondary indexes, these are computed by application logic and stored here. […]

  • HDFS: Logs from site and API events, user activity. Data source and destination for batch jobs run with Map-Reduce and Hive in Hadoop. Big Data and Big Compute!

  • MySQL: This is mainly the current store for the story promotion algorithm and calculations, because it requires lots of JOIN heavy operations which is not a natural fit for the other data stores at this time. However… HBase looks interesting.

  • Redis: The primary store for the personalized news data because it needs to be different for every user and quick to access and update. We use Redis to provide the Digg Streaming API and also for the real time view and click counts since it provides super low latency as a memory-based data storage system.

  • Scribe: the log collecting service. Although this is a primary store, the logs are rotated out of this system regularly and summaries written to HDFS.

I know this will sound strange, but isn’t it too much in there?

@antirez

Original title and link: How Digg is Built? Using a Bunch of NoSQL technologies (NoSQL databases © myNoSQL)

via: http://about.digg.com/blog/how-digg-is-built