Titan: Data Loading and Transactional Benchmark

The Aurelius team describing an advanced benchmark of Titan, a massive scale property graph allowing real-time traversals and updates, sponsored by Pearson, developed and run over 5 months:

The 10 terabyte, 121 billion edge graph was loaded into the cluster in 1.48 days at a rate of approximately 1.2 million edges a second with 0 failed transactions. These numbers were possible due to new developments in Titan 0.3.0 whereby graph partitioning is achieved using a domain-basedbyte order partitioner.

✚ The answer to why Titan is built on Cassandra can be found in this interview between Aurelius CTO Matthias Broecheler and DataStax co-founder Matt Pfeil:

[…] we don’t have to worry about things like replication, backup, and snap shots because all of that stuff is handled by Cassandra. We really just focus on: “How do you distribute a graph?”, “How do you represent a graph efficiently in a big table model?”, “How do you do things like etched compression and other things that are very graph specific in order to make the database fast? And, lastly, “How do to build intelligence index structures so that the graphs traversals, which are the core of any graph database, so that those are as fast as possible?”

Adding Value Through Graph Analysis Using Titan and Faunus

Interesting slidedeck by Matthias Broecheler introducing 3 graph-related tools developed by Vadas Gintautas, Marko Rodriguez, Stephen Mallette and Daniel LaRocque:

  1. Titan: a massive scale property graph allowing real-time traversals and updates
  2. Faunus: for batch processing of large graphs using Hadoop
  3. Fulgora: for global running graph algorithms on large, compressed, in-memory graphs

The first couple of slides are also showing some possible use cases where these tools would prove their usefulness:

A Comparison of 7 Graph Databases

The main page of InfiniteGraph, a graph database commercialized by Objectivity, features an interesting comparison of 7 graph databases (InfiniteGraph, Neo4j, AllegroGraph, Titan, FlockDB, Dex, OrientDB) based on 16 criteria: licensing, source, scalability, graph model, schema model, API, query method, platforms, consistency, concurrency (distributed processing), partitioning, extensibility, visualizing tools, storage back end/persistency, language, backup/restore.

7 graph databases

Unfortunately the image is almost unreadable, but Peter Karussell has extracted the data in a GoogleDoc spreadsheet embedded below.

