graphdb: All content tagged as graphdb in NoSQL databases and polyglot persistence
Papers from VLDB 2012 are starting to surface. Authored by a Chinese team, the “Efficient Subgraph Matching on Billion Node Graphs” paper is introducing a new algorithm optimized for large scale graphs:
We present a novel algorithm that supports efficient subgraph matching for graphs deployed on a distributed memory store. Instead of relying on super-linear indices, we use efficient graph exploration and massive parallel computing for query processing. Our experimental results demonstrate the feasibility of performing subgraph matching on web-scale graph data.
Comparison of space and time complexity of other subgraph matching algorithms:
The mid-part of this Wired article talks a bit about the way Facebook is storing its Open Graph data:
We have an object store, which stores things like users and events and groups and photos, and then we have an edge store that stores the relationship between objects. With Open Graph, we built a layer on top of those systems that allowed developers to define what their objects look like and what their edges look like and then publish those third party objects and edges into the same infrastructure that we used to store all of the first party objects and edges.
Couple of thoughts:
- this data is a good example of a multigraph
- I don’t think Facebook is actually using a graph database for storing the data. Considering the size of the data Facebook is handling, this could be understandable
- There’s no mention of how the metadata, the description of the objects and edges, is stored. I assume this should somehow be connected to historical data to allow the evolution of the data while maintaining its original meaning over time.
- The processing happening on this multigraph data sounds like cluster analysis
Original title and link: Inside Facebook’s Open Graph ( ©myNoSQL)
A slide deck by Pierre De Wilde with a short theoretical intro to property graphs and graph databases and an extensive set of examples of manipulating and traversing graph data with Gremlin. Good reference material.
Last week I spend some time on implementing the Blueprints interface on top of Datomic. The RDF and SPARQL feel of the Datomic data model and query approach makes it a good target for implementing a property graph. I finished the implementation and all unit tests are passing. Now, what makes it really cool is that it is the only distributed “temporal” graph database that I’m aware of. It allows to perform queries against a version of the graph in the past.
This is the first solution I’m reading about addressing the time dimension in a graph model.
Original title and link: Distributed Temporal Graph Database Using Datomic ( ©myNoSQL)
Couple of things I don’t see mentioned in the RedMonk post:
if and how data has been normalized based on each connector availability
According to the post data has been collected between Jan.2011-Mar.2012 and I think that not all connectors have been available since the beginning of the period.
if and how marketing pushes for each connectors have been weighed in
Announcing the Hadoop connector at an event with 2000 attendees or the MongoDB connector at an event with 800 attendeed could definitely influence the results (nb: keep in mind that the largest number is less than 7000, thus 200-500 downloads triggered by such an event have a significant impact)
Redis and VoltDB are mostly OLTP only databases
Original title and link: NoSQL Databases Adoption in Numbers ( ©myNoSQL)