NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



The Social Graph Challenge

Nati Shalom (Gigaspaces) describes a solution to solving a large scale graph problem:

  • Use Memory as the main storage
    • Random I/O access works much better on memory devices than on disk
  • Execute the code with the data - Using Real Time Map/Reduce
    • To reduce the number of iterations required to execute a particular query we use the executor API. The executor API enables us to push the code to the data. By doing that we can execute fairly complex data processing on the data node at memory speed vs network speed.
  • De-normalize the data
    • To reduce the amount of traversal access and network hops per query on the graph we need to copy elements of the graph into each node. For example the list of Friends and friends of friends (up to a certain degree) could be stored in each node and thus become available to any element of the graph  without the need to consult with other nodes.

A couple of comments:

  • if all you have is memory, then you’ll have to replicate data at least 2 or 3 times. Result: more memory needed.
  • de-normalized data means even more memory

All these boil down to the idea Nati has been supporting for a while RAM is the new disk. But I don’t think it applies to BigData.

Below is the complete video:

Original title and link: The Social Graph Challenge (NoSQL databases © myNoSQL)