NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



GoldenOrb: All content tagged as GoldenOrb in NoSQL databases and polyglot persistence

Big Graph-Processing Library From Twitter: Cassovary

Cassovary is designed from the ground up to efficiently handle graphs with billions of edges. It comes with some common node and graph data structures and traversal algorithms. A typical usage is to do large-scale graph mining and analysis.

If you are reading this you’ve most probably heard of Pregel—if you didn’t then you should check out the Pregel: a system for large-scale graph processing paper and then how Pregel and MapReduce compare—and also the 6 Pregel inspired frameworks.

The Cassovary project page introduces it as:

Cassovary is a simple “big graph” processing library for the JVM. Most JVM-hosted graph libraries are flexible but not space efficient. Cassovary is designed from the ground up to first be able to efficiently handle graphs with billions of nodes and edges. A typical example usage is to do large scale graph mining and analysis of a big network. Cassovary is written in Scala and can be used with any JVM-hosted language. It comes with some common data structures and algorithms.

I’m not sure yet if:

  1. Cassovary works with any graphy data source or requires FlockDB—which is more of a persisted graph than a graph database
  2. Cassovary is inspired by Pregel in any ways or if it’s addressing a limited problem space (similarly to FlockDB)

Update: Pankaj Gupta helped clarify the first question (and probably part of the second too):

At Twitter we use flockdb as our real-time graphdb, and export daily for use in cassovary, but any store could be used.

Original title and link: Big Graph-Processing Library From Twitter: Cassovary (NoSQL database©myNoSQL)


6 Pregel-Inspired Frameworks

A quick overview of 6 Pregel-inspired frameworks (Apache Hama, GoldenOrb, Apache Giraph, Phoebus, Signal/Collect, and HipG):

So, to summarize, what Hama, GoldenOrb and Giraph have in common is: Java platform, Apache License (and incubation), BSP computation. What they differ for: Hama offers BSP primitives not graph processing API (so it sits at a lower level), GoldenOrb provides Pregel’s API but requires the deployment of additional software to your existing Hadoop infrastructure, Giraph provides Pregel’s API (and is kind of complete at the current state) and doesn’t require additional infrastructure.

Original title and link: 6 Pregel-Inspired Frameworks (NoSQL database©myNoSQL)


Paper: Graph Based Statistical Analysis of Network Traffic

Published by a group from Los Alamos National Lab (Hristo Djidjev, Gary Sandine, Curtis Storlie, Scott Vander Wiel):

We propose a method for analyzing traffic data in large computer networks such as big enterprise networks or the Internet. Our approach combines graph theoretical representation of the data and graph analysis with novel statistical methods for discovering pattern and timerelated anomalies. We model the traffic as a graph and use temporal characteristics of the data in order to decompose it into subgraphs corresponding to individual sessions, whose characteristics are then analyzed using statistical methods. The goal of that analysis is to discover patterns in the network traffic data that might indicate intrusion activity or other malicious behavior.

The embedded PDF and download link after the break.

GoldenOrb: Ravel Google Pregel Implementation Released

Announced back in March, Ravel has finally released GoldenOrb an implementation of the Google Pregel paper—if you are not familiar with Google Pregel check the Pregel: Graph Processing at Large-Scale and Ricky Ho’s comparison of Pregel and MapReduce.

Until Ravel’s GoldenOrb the only experimental implementation of Pregel was the Erlang-based Phoebus. GoldenOrb was released under the Apache License v2.0 and is available on GitHub.

GoldenOrb is a cloud-based open source project for massive-scale graph analysis, built upon best-of-breed software from the Apache Hadoop project modeled after Google’s Pregel architecture.

Original title and link: GoldenOrb: Ravel Google Pregel Implementation Released (NoSQL database©myNoSQL)

Graph Databases: Distributed Traversal Engines

Marko A.Rodriguez:

In the distributed traversal engine model, a traversal is represented as a flow of messages between elements of the graph. Generally, each element (e.g. vertex) is operating independently of the other elements. Each element is seen as its own processor with its own (usually homogenous) program to execute. Elements communicate with each other via message passing. When no more messages have been passed, the traversal is complete and the results of the traversal are typically represented as a distributed data structure over the elements. Graph databases of this nature tend to use the Bulk Synchronous Parallel model of distributed computing. Each step is synchronized in a manner analogous to a clock cycle in hardware. Instances of this model include Agrapa, Pregel, Trinity, GoldenOrb, and others.

None of these graph databases offers distributed traversal engines.

Original title and link: Graph Databases: Distributed Traversal Engine (NoSQL databases © myNoSQL)


Ravel Hopes to Open-Source Graph Databases

Ravel, an Austin, Texas-based company, wants to provide a supported, open-source version of Google’s Pregel software called GoldenOrb to handle large-scale graph analytics.

Is it a new graph database or a Pregel implementation? Watch the interview for yourself and tell me what do you think it is?