NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



ElephantDB: All content tagged as ElephantDB in NoSQL databases and polyglot persistence

ElephantDB and Storm Join the Twitter Flock

That’s to say BackType, creators of Cascalog, ElephantDB, and Storm , has been acquired by Twitter (which in case you didn’t know names most of their open source libraries and storage solutions using bird names).

The announcement is here . Looking forward to seeing Storm open sourced.

Original title and link: ElephantDB and Storm Join the Twitter Flock (NoSQL database©myNoSQL)

BackType’s ElephantDB

I didn’t know BackType’s ElephantDB is open source and available on GitHub, same as their Cascalog the Clojure based query language for Hadoop.

Original title and link: BackType’s ElephantDB (NoSQL databases © myNoSQL)

Big Data Analysis at BackType

RWW has a nice post diving into the data flow and the tools used by BackType, a company with only 3 engineers, to deal and analyze large amounts of data.

They’ve invented their own language, Cascalog, to make analysis easy, and their own database, ElephantDB, to simplify delivering the results of their analysis to users. They’ve even written a system to update traditional batch processing of massive data sets with new information in near real-time.

Some highlights:

  • 25 terabytes of compressed binary data, over 100 billion individual records
  • all services and data storage are on Amazon S3 and EC2
  • 60 up to 150 EC2 instances servicing an average of 400 requests/s
  • Clojure and Python as platform languages
  • Hadoop, Cascading and Cascalog are central pieces of BackType’s platform
  • Cascalog, a Clojure-based query language for Hadoop, was created and open sourced by BackType’s engineer Nathan Marz
  • ElephantDB, the storage solution, is a read-only cluster built on top of BerkleyDB files
  • Crawlers place data in Gearman queues for processing and storing

BackType data flow is presented in the following diagram:

BackType data flow

Included below is an interview with Nathan about Cascalog:

@pharkmillups .

Original title and link: Big Data Analysis at BackType (NoSQL databases © myNoSQL)