Learning NoSQL from Twitter’s Experience
Leaving aside the tons of NoSQL Twitter applications — and if that is not enough here are more NoSQL-based Twitter apps and even more, Twitter seems to be having a lot of fun (nb read work and innovation) in the NoSQL space.
It all started with the problem of handling big data in real-time. Nick Kallen’s (@nk) slides below are explaining the problems faced and the way Twitter tackled them:
Then it was the time to consider Cassandra at Twitter:
We have a lot of data, the growth factor in that data is huge and the rate of growth is accelerating. We have a system in place based on shared mysql + memcache but its quickly becoming prohibitively costly (in terms of manpower) to operate. We need a system that can grow in a more automated fashion and be highly available
and scale Twitter with Cassandra (Ryan King (@rk) presentation):
But storing data is not enough and Twitter had to put the NoSQL data to work . For that Twitter is using Hadoop, Pig and HBase as “Cassandra is OLTP and HBase is OLAP“. Kevin Weil (@kevilweil) slides, presented at nosql:eu and Dmitriy Ryaboy (@squarecog) are giving a lot of details about the HBase, Hadoop and Pig usage:
That’s a ton to learn from NoSQL at Twitter!