NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Hadoop at Twitter: An Interview with Kevin Weil, Twitter Analytics Lead

Kevin Weil[1] in an interview about Twitter’s usage of Hadoop:

Hadoop is our data warehouse; every piece of data we store is archived in HDFS. We use HBase for data that sees updates frequently, or data we occasionally need low-latency access to. Every node in our cluster runs HBase. We use Java MapReduce for simple jobs, or jobs which have tight performance requirements. We use Pig for most of our analysis jobs, because its flexibility helps us iterate rapidly to arrive at the right way of looking at the data.

Our Hadoop use is also evolving: initially it was primarily used as an analysis tool to help us better understand the Twitter ecosystem, and that’s not going to change. But it’s increasingly used to build parts of products you use on the site every day such as People Search, the data for which is built with Hadoop. There are many more products like this in development.

Undeniably, Twitter is (deep) into NoSQL.

  1. Kevin Weil: Twitter Analytics Lead, @kevinweil  ()

Original title and link: Hadoop at Twitter: An Interview with Kevin Weil, Twitter Analytics Lead (NoSQL databases © myNoSQL)