NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



graph: All content tagged as graph in NoSQL databases and polyglot persistence

Hadoop for graphs - GraphLab picks up $6.75m from Madrona and NEA

Robin Wauters for TNW:

Seattle startup GraphLab claims it is building the “fastest machine-learning analytics engine for graph datasets”, based on the popular open-source distributed graph computation framework with the same name, and it has just raised capital to come through on its promise.

Good luck to GraphLab’s team.

✚ Here’s a short list of MapReduce implementations for graphs.

Original title and link: Hadoop for graphs - GraphLab picks up $6.75m from Madrona and NEA (NoSQL database©myNoSQL)


NetflixGraph: In-Memory Directed Graph Data

Another open source project from Netflix: NetflixGraph:

NetflixGraph is a compact in-memory data structure used to represent directed graph data. You can use NetflixGraph to vastly reduce the size of your application’s memory footprint, potentially by an order of magnitude or more. If your application is I/O bound, you may be able to remove that bottleneck by holding your entire dataset in RAM. You’ll likely be very surprised by how little memory is actually required to represent your data.

At first glance it sounds sort of a Redis for graph data. Available on GitHub.

Original title and link: NetflixGraph: In-Memory Directed Graph Data (NoSQL database©myNoSQL)


What Is the Most Promising Graph Datastore?

Very interesting answer on Quora from professor Josep Lluis Larriba Pey.

  1. for very lager data size (TB): Infinitegraph, DEX
  2. for query speed: DEX
  3. for transaction support: Neo4j

Original title and link: What Is the Most Promising Graph Datastore? (NoSQL database©myNoSQL)


The Social Graph Challenge

Nati Shalom (Gigaspaces) describes a solution to solving a large scale graph problem:

  • Use Memory as the main storage
    • Random I/O access works much better on memory devices than on disk
  • Execute the code with the data - Using Real Time Map/Reduce
    • To reduce the number of iterations required to execute a particular query we use the executor API. The executor API enables us to push the code to the data. By doing that we can execute fairly complex data processing on the data node at memory speed vs network speed.
  • De-normalize the data
    • To reduce the amount of traversal access and network hops per query on the graph we need to copy elements of the graph into each node. For example the list of Friends and friends of friends (up to a certain degree) could be stored in each node and thus become available to any element of the graph  without the need to consult with other nodes.

A couple of comments:

  • if all you have is memory, then you’ll have to replicate data at least 2 or 3 times. Result: more memory needed.
  • de-normalized data means even more memory

All these boil down to the idea Nati has been supporting for a while RAM is the new disk. But I don’t think it applies to BigData.

Below is the complete video:

Original title and link: The Social Graph Challenge (NoSQL databases © myNoSQL)


On Graphs and Graph Databases: Memoirs of a Graph Addict

Marko A. Rodriguez[1] posted another slides deck covering:

  • graph structures
  • graph databases
  • graph applications
  • TinkerPop product suite[2]

What better name than Memoirs of a Graph Addict: Despair to Redemption could a 129 slides presentation on graph and graph databases have?

Memoirs of a Graph Addict

  1. Marko A Rodriguez: Graph Systems Architect AT&T, @twarko  

  2. TinkerPop  

Original title and link: On Graphs and Graph Databases: Memoirs of a Graph Addict (NoSQL databases © myNoSQL)

Efficient Large-Scale Graph Analysis with Hadoop

Michael Schatz[1]:

Otherwise, the main technical challenge of this design is the graph structure must be available at each step of the iterative algorithm, but in the design above we only distribute the mutable values (partial PageRank value, partial search path, etc). This challenge is normally resolved by encoding the graph structure and mutable values as tuples that are separately tagged to indicate if the record should be interpreted as a node or a special mutable value. This way the map function can read in a node as input, emit messages for neighboring nodes using the neighboring node ids as the keys, and also reemit the node tuple with the current node id as the key. Then, as usual, the shuffle phase collects key-value pairs with the same key, which effectively collects together a node tuple with all the messages destined for that node for every node in the graph. The reduce function then processes each node tuple with associated messages, computes an updated value, and saves away the updated node with the complete graph structure for the next round of computation.

I guess this is exactly the reason Google came up with Pregel, which even if somehow similar to MapReduce is optimized for graph processing. While we don’t have access (yet?) at Google’s implementation, there’s an attempt to build an open source version: Phoebus, Erlang-based implementation of Pregel.

  1. Michael Schatz is an assistant professor in the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory  ()

Original title and link: Efficient Large-Scale Graph Analysis with Hadoop (NoSQL databases © myNoSQL)