NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



graphdb: All content tagged as graphdb in NoSQL databases and polyglot persistence

A Comparison of 7 Graph Databases

The main page of InfiniteGraph, a graph database commercialized by Objectivity, features an interesting comparison of 7 graph databases (InfiniteGraph, Neo4j, AllegroGraph, Titan, FlockDB, Dex, OrientDB) based on 16 criteria: licensing, source, scalability, graph model, schema model, API, query method, platforms, consistency, concurrency (distributed processing), partitioning, extensibility, visualizing tools, storage back end/persistency, language, backup/restore.

7 graph databases

Unfortunately the image is almost unreadable, but Peter Karussell has extracted the data in a GoogleDoc spreadsheet embedded below.

Original title and link: A Comparison of 7 Graph Databases (NoSQL database©myNoSQL)

Using Neo4j Graph Database With Ruby

A two part article by Thiago Jackiw providing a brief explanation of what graph databases and Neo4j are and a quick look at 3 Ruby libraries for Neo4j: Neo4j.rb1, Neography2, and Neoid3

This article demonstrated how to install Neo4j and the basic idea of how to integrate it with a Ruby/Rails application using the different solutions available. Even though the examples given here barely scratched the surface of Neo4j, it should hopefully give you enough knowledge and curiosity to start integrating it on your own projects.

Original title and link: Using Neo4j Graph Database With Ruby (NoSQL database©myNoSQL)

Neo Technology Raises Another $11mil for Neo4j Graph Database

Derrick Harris for GigaOm:

Graph database startup Neo Technology has raised another $11 million, providing more fuel to the fire of specialized databases. Whether they’re graph databases organizing data by relationships, or geospatial databases concerned with where stuff is located, everyone is trying capitalize on myriad new data sources available.

According to my calculations this brings Neo Technology at $24.1 millions ($10.6M in Sept.2011 and $2.5 in Oct.2009).

Original title and link: Neo Technology Raises Another $11mil for Neo4j Graph Database (NoSQL database©myNoSQL)


Next Neo4j Version Implementing HA Without ZooKeeper

The next version of Neo4j will remove the dependency on ZooKeeper for high availability setups. In a post on Neo4j blog, the team has announced the availability of the 1st milestone of Neo4j 1.9 which already contains the new implementation of Neo4j High Availability Cluster:

With Neo4j 1.9 M01, cluster members communicate directly with each other, based on an implementation of the Paxos consensus protocol for master election.

According to the updated documentation annotated with my own comments:

  • Write transactions can be performed on any database instance in a cluster. (nb: writes are performed on the master first, but the cluster does the routing automatically)
  • If the master fails a new master will be elected automatically. A new master is elected and started within just a few seconds and during this time no writes can take place (the writes will block or in rare cases throw an exception)
  • If the master goes down any running write transaction will be rolled back and new transactions will block or fail until a new master has become available.
  • The cluster automatically handles instances becoming unavailable (for example due to network issues), and also makes sure to accept them as members in the cluster when they are available again.
  • Transactions are atomic, consistent and durable but eventually propagated out to other slaves. (nb: a transaction includes only the write to the master)
  • Updates to slaves are eventual consistent by nature but can be configured to be pushed optimistically from master during commit. (nb: writes to slave will still not be part of the transaction)
  • In case there were changes on the master that didn’t get replicated before it failed, there are chances to reach a situation where two different versions exists—if the failed master recovers. This situation is resolved by having the old master dismiss its copy of the data (nb the documentation says move away)
  • Reads are highly available and the ability to handle read load scales with more database instances in the cluster.

Original title and link: Next Neo4j Version Implementing HA Without ZooKeeper (NoSQL database©myNoSQL)

What Is the Most Promising Graph Datastore?

Very interesting answer on Quora from professor Josep Lluis Larriba Pey.

  1. for very lager data size (TB): Infinitegraph, DEX
  2. for query speed: DEX
  3. for transaction support: Neo4j

Original title and link: What Is the Most Promising Graph Datastore? (NoSQL database©myNoSQL)


What's the Current State of Graph Databases?

Jim Webber1 in an interview with Srini Penchikala for InfoQ:

The graph databases are odd, because they’ve actually decided to have a much more expressive data model compared to relational databases. So I think they are an oddity compared to the other three types of NoSQL stores, which means that when a developer first comes across them there is an awful lot of head scratching—you can see this haircut was completely caused by Neo4J. So I think compared to the other NoSQL stores, the graph database community is a little bit further behind in terms of adoption and penetration because they are a bit of an odd beast when you look at them first, “What would I use graphs for, they are those things I forgot from university, with that boring old guy doing math on the whiteboard”, on the blackboard even, I’m so old we had chalk, would you believe?

It’s almost always impossible for me to disagree with Jim. Expanding a bit on the quote above, I’d speculate that a bit of head scratching before adopting a new database is good as it means you’ll not see many improper use cases.

  1. Jim Webber: Chief Scientist at Neo Technology 

Original title and link: What’s the Current State of Graph Databases? (NoSQL database©myNoSQL)


Paper: Efficient Subgraph Matching on Billion Node Graphs

Papers from VLDB 2012 are starting to surface. Authored by a Chinese team, the “Efficient Subgraph Matching on Billion Node Graphs” paper is introducing a new algorithm optimized for large scale graphs:

We present a novel algorithm that supports efficient subgraph matching for graphs deployed on a distributed memory store. Instead of relying on super-linear indices, we use efficient graph exploration and massive parallel computing for query processing. Our experimental results demonstrate the feasibility of performing subgraph matching on web-scale graph data.

Comparison of space and time complexity of other subgraph matching algorithms:

Subgraph Matching Methods

Rolling Upgrades in Upcoming Neo4j 1.8

Chris Gioran describes rolling upgrades, a new feature in the upcoming Neo4j 1.8

So the rolling upgrade, actually, works exactly as you’d expect an upgrade would work. If there are not breaking changes between versions, you normally begin with the slaves, powering down, copying the store, migrating configuration if needed, then bringing that server back up. The new version would take over, communicate with the rest of the cluster and you wouldn’t notice anything.

A rolling upgrade offers that with versions that have incompatible protocols. Each slave, as it is brought up, detects the version running in the cluster and gracefully falls back into a compatibility mode that doesn’t allow it to become master, but allows it to continue to execute transactions.

Another thing I’ve found interesting is that the time a master machine is upgraded is considered the confirmation of a completed upgrade and all machines are switching to the new protocol. Clever.

Original title and link: Rolling Upgrades in Upcoming Neo4j 1.8 (NoSQL database©myNoSQL)


Inside Facebook’s Open Graph

The mid-part of this Wired article talks a bit about the way Facebook is storing its Open Graph data:

We have an object store, which stores things like users and events and groups and photos, and then we have an edge store that stores the relationship between objects. With Open Graph, we built a layer on top of those systems that allowed developers to define what their objects look like and what their edges look like and then publish those third party objects and edges into the same infrastructure that we used to store all of the first party objects and edges.

Couple of thoughts:

  1. this data is a good example of a multigraph
  2. I don’t think Facebook is actually using a graph database for storing the data. Considering the size of the data Facebook is handling, this could be understandable
  3. There’s no mention of how the metadata, the description of the objects and edges, is stored. I assume this should somehow be connected to historical data to allow the evolution of the data while maintaining its original meaning over time.
  4. The processing happening on this multigraph data sounds like cluster analysis

Original title and link: Inside Facebook’s Open Graph (NoSQL database©myNoSQL)

An Overview of Neo4j.rb 2.0

Andreas Ronge writing about using Neo4j in embedded mode with JRuby:

The advantage of the embedded Neo4j is better performance due to the direct use of the Java API. This means you can write queries in plain Ruby! Another advantage of the embedded Neo4j is that since it’s an embedded database there is one less piece of infrastructure (the database server) to install. The embedded database is running in the same process as your (Rails) application. Since JRuby has real threads there is no need to start up several instances of the database or of the Ruby runtime since JRuby can utilize all available cores on the CPU. There is actually even no need to start the database at all as it will be started automatically when needed. Notice it’s still possible to use the REST protocol or the web admin interface from an embedded Neo4j, see the neo4j-admin gem.

So which should I choose ? Well, if you can’t use JRuby or you don’t need an Active Model compliant Neo4j binding then the Neo4j Server is a good choice, otherwise I would suggest using the embedded Neo4j.rb gem (but I’m a bit biased)

As showed also by the earlier [migrating data from Oracle to MongoDB with JRuby], JRuby proves to be an interesting beast for handling data. I’m more on the side of Python, but Jython is not (yet?) as up-to-date as JRuby.

Original title and link: An Overview of Neo4j.rb 2.0 (NoSQL database©myNoSQL)


PuppetDB: Configuration Management Database for Puppet

PuppetDB is replacing CouchDB for managing Puppet configurations and is a service layer written in Clojure with a PostgreSQL back-end. Not a graph database:

PuppetDB is a key component of the Puppet Data Library, and brings that to bear in its query API. Resources, facts, nodes, and metrics can all be queried over HTTP. For resources and nodes, there is a simple query language which can be used to form arbitrarily complex requests. The public API is the same one that Puppet uses to make storeconfigs queries (using the «||» operator) of PuppetDB, but provides a superset of the functionality provided by storeconfigs.

PuppetDB is faster, smarter, and has more complete data than ever before. […] PuppetDB offers great power over and insight into your infrastructure, and it’s only going to get bigger and better.

Original title and link: PuppetDB: Configuration Management Database for Puppet (NoSQL database©myNoSQL)


Short Intro to Graph Databases, Manipulating and Traversing With Gremlin

A slide deck by Pierre De Wilde with a short theoretical intro to property graphs and graph databases and an extensive set of examples of manipulating and traversing graph data with Gremlin. Good reference material.