Cassovary is designed from the ground up to efficiently handle graphs with billions of edges. It comes with some common node and graph data structures and traversal algorithms. A typical usage is to do large-scale graph mining and analysis.
If you are reading this you’ve most probably heard of Pregel—if you didn’t then you should check out the Pregel: a system for large-scale graph processing paper and then how Pregel and MapReduce compare—and also the 6 Pregel inspired frameworks.
The Cassovary project page introduces it as:
Cassovary is a simple “big graph” processing library for the JVM. Most JVM-hosted graph libraries are flexible but not space efficient. Cassovary is designed from the ground up to first be able to efficiently handle graphs with billions of nodes and edges. A typical example usage is to do large scale graph mining and analysis of a big network. Cassovary is written in Scala and can be used with any JVM-hosted language. It comes with some common data structures and algorithms.
I’m not sure yet if:
- Cassovary works with any graphy data source or requires FlockDB—which is more of a persisted graph than a graph database
- Cassovary is inspired by Pregel in any ways or if it’s addressing a limited problem space (similarly to FlockDB)
Update: Pankaj Gupta helped clarify the first question (and probably part of the second too):
At Twitter we use flockdb as our real-time graphdb, and export daily for use in cassovary, but any store could be used.
Original title and link: Big Graph-Processing Library From Twitter: Cassovary ( ©myNoSQL)