graphdb: All content tagged as graphdb in NoSQL databases and polyglot persistence
This article demonstrated how to install Neo4j and the basic idea of how to integrate it with a Ruby/Rails application using the different solutions available. Even though the examples given here barely scratched the surface of Neo4j, it should hopefully give you enough knowledge and curiosity to start integrating it on your own projects.
Original title and link: Using Neo4j Graph Database With Ruby ( ©myNoSQL)
The next version of Neo4j will remove the dependency on ZooKeeper for high availability setups. In a post on Neo4j blog, the team has announced the availability of the 1st milestone of Neo4j 1.9 which already contains the new implementation of Neo4j High Availability Cluster:
With Neo4j 1.9 M01, cluster members communicate directly with each other, based on an implementation of the Paxos consensus protocol for master election.
According to the updated documentation annotated with my own comments:
- Write transactions can be performed on any database instance in a cluster. (nb: writes are performed on the master first, but the cluster does the routing automatically)
- If the master fails a new master will be elected automatically. A new master is elected and started within just a few seconds and during this time no writes can take place (the writes will block or in rare cases throw an exception)
- If the master goes down any running write transaction will be rolled back and new transactions will block or fail until a new master has become available.
- The cluster automatically handles instances becoming unavailable (for example due to network issues), and also makes sure to accept them as members in the cluster when they are available again.
- Transactions are atomic, consistent and durable but eventually propagated out to other slaves. (nb: a transaction includes only the write to the master)
- Updates to slaves are eventual consistent by nature but can be configured to be pushed optimistically from master during commit. (nb: writes to slave will still not be part of the transaction)
- In case there were changes on the master that didn’t get replicated before it failed, there are chances to reach a situation where two different versions exists—if the failed master recovers. This situation is resolved by having the old master dismiss its copy of the data (nb the documentation says move away)
- Reads are highly available and the ability to handle read load scales with more database instances in the cluster.
Original title and link: Next Neo4j Version Implementing HA Without ZooKeeper ( ©myNoSQL)
Papers from VLDB 2012 are starting to surface. Authored by a Chinese team, the “Efficient Subgraph Matching on Billion Node Graphs” paper is introducing a new algorithm optimized for large scale graphs:
We present a novel algorithm that supports efficient subgraph matching for graphs deployed on a distributed memory store. Instead of relying on super-linear indices, we use efficient graph exploration and massive parallel computing for query processing. Our experimental results demonstrate the feasibility of performing subgraph matching on web-scale graph data.
Comparison of space and time complexity of other subgraph matching algorithms:
The mid-part of this Wired article talks a bit about the way Facebook is storing its Open Graph data:
We have an object store, which stores things like users and events and groups and photos, and then we have an edge store that stores the relationship between objects. With Open Graph, we built a layer on top of those systems that allowed developers to define what their objects look like and what their edges look like and then publish those third party objects and edges into the same infrastructure that we used to store all of the first party objects and edges.
Couple of thoughts:
- this data is a good example of a multigraph
- I don’t think Facebook is actually using a graph database for storing the data. Considering the size of the data Facebook is handling, this could be understandable
- There’s no mention of how the metadata, the description of the objects and edges, is stored. I assume this should somehow be connected to historical data to allow the evolution of the data while maintaining its original meaning over time.
- The processing happening on this multigraph data sounds like cluster analysis
Original title and link: Inside Facebook’s Open Graph ( ©myNoSQL)
A slide deck by Pierre De Wilde with a short theoretical intro to property graphs and graph databases and an extensive set of examples of manipulating and traversing graph data with Gremlin. Good reference material.