Nati Shalom (Gigaspaces) describes a solution to solving a large scale graph problem:
- Use Memory as the main storage
- Random I/O access works much better on memory devices than on disk
- Execute the code with the data - Using Real Time Map/Reduce
- To reduce the number of iterations required to execute a particular query
we use the executor API. The executor API enables us to push the code to
the data. By doing that we can execute fairly complex data processing on
the data node at memory speed vs network speed.
- De-normalize the data
- To reduce the amount of traversal access and network hops per query
on the graph we need to copy elements of the graph into each node. For
example the list of Friends and friends of friends (up to a certain degree)
could be stored in each node and thus become available to any element of
the graph without the need to consult with other nodes.
A couple of comments:
- if all you have is memory, then you’ll have to replicate data at least 2 or 3 times. Result: more memory needed.
- de-normalized data means even more memory
All these boil down to the idea Nati has been supporting for a while RAM is the new disk. But I don’t think it applies to BigData.
Below is the complete video:
Original title and link: The Social Graph Challenge (NoSQL databases © myNoSQL)