RWW has a nice post diving into the data flow and the tools used by BackType, a company with only 3 engineers, to deal and analyze large amounts of data.
They’ve invented their own language, Cascalog, to make analysis easy, and their own database, ElephantDB, to simplify delivering the results of their analysis to users. They’ve even written a system to update traditional batch processing of massive data sets with new information in near real-time.
25 terabytes of compressed binary data, over 100 billion individual records
all services and data storage are on Amazon S3 and EC2
60 up to 150 EC2 instances servicing an average of 400 requests/s
Clojure and Python as platform languages
Hadoop, Cascading and Cascalog are central pieces of BackType’s platform
This blog is called myNoSQL and it is written by me, Alex Popescu, a software architect with a passion for open source and communities.
It records my readings, learnings, and opinions on NoSQL databases, polyglot persistence, and distributed systems -- subjects that I'm passionate about.
The opinions expressed here are my own, and no other party necessarily agrees with them.
If you feel I'm biased, I probably am.