ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Gearman: All content tagged as Gearman in NoSQL databases and polyglot persistence

Big Data Analysis at BackType

RWW has a nice post diving into the data flow and the tools used by BackType, a company with only 3 engineers, to deal and analyze large amounts of data.

They’ve invented their own language, Cascalog, to make analysis easy, and their own database, ElephantDB, to simplify delivering the results of their analysis to users. They’ve even written a system to update traditional batch processing of massive data sets with new information in near real-time.

Some highlights:

  • 25 terabytes of compressed binary data, over 100 billion individual records
  • all services and data storage are on Amazon S3 and EC2
  • 60 up to 150 EC2 instances servicing an average of 400 requests/s
  • Clojure and Python as platform languages
  • Hadoop, Cascading and Cascalog are central pieces of BackType’s platform
  • Cascalog, a Clojure-based query language for Hadoop, was created and open sourced by BackType’s engineer Nathan Marz
  • ElephantDB, the storage solution, is a read-only cluster built on top of BerkleyDB files
  • Crawlers place data in Gearman queues for processing and storing

BackType data flow is presented in the following diagram:

BackType data flow

Included below is an interview with Nathan about Cascalog:

@pharkmillups .

Original title and link: Big Data Analysis at BackType (NoSQL databases © myNoSQL)

via: http://www.readwriteweb.com/hack/2011/01/secrets-of-backtypes-data-engineers.php