in an interview about Twitter’s usage of Hadoop:
Hadoop is our data warehouse; every piece of data we store is archived in HDFS. We use HBase for data that sees updates frequently, or data we occasionally need low-latency access to. Every node in our cluster runs HBase. We use Java MapReduce for simple jobs, or jobs which have tight performance requirements. We use Pig for most of our analysis jobs, because its flexibility helps us iterate rapidly to arrive at the right way of looking at the data.
Our Hadoop use is also evolving: initially it was primarily used as an analysis tool to help us better understand the Twitter ecosystem, and that’s not going to change. But it’s increasingly used to build parts of products you use on the site every day such as People Search, the data for which is built with Hadoop. There are many more products like this in development.
Undeniably, Twitter is (deep) into NoSQL.
Original title and link: Hadoop at Twitter: An Interview with Kevin Weil, Twitter Analytics Lead (NoSQL databases © myNoSQL)