ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Big Graph-Processing Library From Twitter: Cassovary

Cassovary is designed from the ground up to efficiently handle graphs with billions of edges. It comes with some common node and graph data structures and traversal algorithms. A typical usage is to do large-scale graph mining and analysis.

If you are reading this you’ve most probably heard of Pregel—if you didn’t then you should check out the Pregel: a system for large-scale graph processing paper and then how Pregel and MapReduce compare—and also the 6 Pregel inspired frameworks.

The Cassovary project page introduces it as:

Cassovary is a simple “big graph” processing library for the JVM. Most JVM-hosted graph libraries are flexible but not space efficient. Cassovary is designed from the ground up to first be able to efficiently handle graphs with billions of nodes and edges. A typical example usage is to do large scale graph mining and analysis of a big network. Cassovary is written in Scala and can be used with any JVM-hosted language. It comes with some common data structures and algorithms.

I’m not sure yet if:

  1. Cassovary works with any graphy data source or requires FlockDB—which is more of a persisted graph than a graph database
  2. Cassovary is inspired by Pregel in any ways or if it’s addressing a limited problem space (similarly to FlockDB)

Update: Pankaj Gupta helped clarify the first question (and probably part of the second too):

At Twitter we use flockdb as our real-time graphdb, and export daily for use in cassovary, but any store could be used.

Original title and link: Big Graph-Processing Library From Twitter: Cassovary (NoSQL database©myNoSQL)

via: http://engineering.twitter.com/2012/03/cassovary-big-graph-processing-library.html