NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



graphdb: All content tagged as graphdb in NoSQL databases and polyglot persistence

Inside Facebook’s Open Graph

The mid-part of this Wired article talks a bit about the way Facebook is storing its Open Graph data:

We have an object store, which stores things like users and events and groups and photos, and then we have an edge store that stores the relationship between objects. With Open Graph, we built a layer on top of those systems that allowed developers to define what their objects look like and what their edges look like and then publish those third party objects and edges into the same infrastructure that we used to store all of the first party objects and edges.

Couple of thoughts:

  1. this data is a good example of a multigraph
  2. I don’t think Facebook is actually using a graph database for storing the data. Considering the size of the data Facebook is handling, this could be understandable
  3. There’s no mention of how the metadata, the description of the objects and edges, is stored. I assume this should somehow be connected to historical data to allow the evolution of the data while maintaining its original meaning over time.
  4. The processing happening on this multigraph data sounds like cluster analysis

Original title and link: Inside Facebook’s Open Graph (NoSQL database©myNoSQL)

An Overview of Neo4j.rb 2.0

Andreas Ronge writing about using Neo4j in embedded mode with JRuby:

The advantage of the embedded Neo4j is better performance due to the direct use of the Java API. This means you can write queries in plain Ruby! Another advantage of the embedded Neo4j is that since it’s an embedded database there is one less piece of infrastructure (the database server) to install. The embedded database is running in the same process as your (Rails) application. Since JRuby has real threads there is no need to start up several instances of the database or of the Ruby runtime since JRuby can utilize all available cores on the CPU. There is actually even no need to start the database at all as it will be started automatically when needed. Notice it’s still possible to use the REST protocol or the web admin interface from an embedded Neo4j, see the neo4j-admin gem.

So which should I choose ? Well, if you can’t use JRuby or you don’t need an Active Model compliant Neo4j binding then the Neo4j Server is a good choice, otherwise I would suggest using the embedded Neo4j.rb gem (but I’m a bit biased)

As showed also by the earlier [migrating data from Oracle to MongoDB with JRuby], JRuby proves to be an interesting beast for handling data. I’m more on the side of Python, but Jython is not (yet?) as up-to-date as JRuby.

Original title and link: An Overview of Neo4j.rb 2.0 (NoSQL database©myNoSQL)


PuppetDB: Configuration Management Database for Puppet

PuppetDB is replacing CouchDB for managing Puppet configurations and is a service layer written in Clojure with a PostgreSQL back-end. Not a graph database:

PuppetDB is a key component of the Puppet Data Library, and brings that to bear in its query API. Resources, facts, nodes, and metrics can all be queried over HTTP. For resources and nodes, there is a simple query language which can be used to form arbitrarily complex requests. The public API is the same one that Puppet uses to make storeconfigs queries (using the «||» operator) of PuppetDB, but provides a superset of the functionality provided by storeconfigs.

PuppetDB is faster, smarter, and has more complete data than ever before. […] PuppetDB offers great power over and insight into your infrastructure, and it’s only going to get bigger and better.

Original title and link: PuppetDB: Configuration Management Database for Puppet (NoSQL database©myNoSQL)


Short Intro to Graph Databases, Manipulating and Traversing With Gremlin

A slide deck by Pierre De Wilde with a short theoretical intro to property graphs and graph databases and an extensive set of examples of manipulating and traversing graph data with Gremlin. Good reference material.

Neo4j Data Modeling: What Question Do You Want to Answer?

Mark Needham:

Over the past few weeks I’ve been modelling ThoughtWorks project data in neo4j and I realised that the way that I’ve been doing this is by considering what question I want to answer and then building a graph to answer it.

This same principle should be applied to modeling with any NoSQL database. Thinking in terms of access patterns is one of the major differences between doing data modeling in the NoSQL space and the relational world, which is driven, at least in the first phases and theoretically, by the normalization rules.

Original title and link: Neo4j Data Modeling: What Question Do You Want to Answer? (NoSQL database©myNoSQL)


How to Import Large Graphs to Neo4j With Spring Data

In my case, I wanted to create a simple recommendation engine (the domain doesn’t matter so much). To do that, I had to import FAST 20 million nodes of one-to-many, sparse matrix data. This became a bit more complicated (and interesting) task than originally anticipated, so it became a mini-project itself.

Bulk insert is a scenario that every database should have it covered.

Original title and link: How to Import Large Graphs to Neo4j With Spring Data (NoSQL database©myNoSQL)


Distributed Temporal Graph Database Using Datomic

Davy Suvee describes the solution in the Gremlin group and shares the code on GitHub:

Last week I spend some time on implementing the Blueprints interface on top of Datomic. The RDF and SPARQL feel of the Datomic data model and query approach makes it a good target for implementing a property graph. I finished the implementation and all unit tests are passing. Now, what makes it really cool is that it is the only distributed “temporal” graph database that I’m aware of. It allows to perform queries against a version of the graph in the past.

This is the first solution I’m reading about addressing the time dimension in a graph model.

Original title and link: Distributed Temporal Graph Database Using Datomic (NoSQL database©myNoSQL)

Neo4j REST API Tutorial

A detailed language agnostic intro to the Neo4j REST API:

In the above examples we have seen how nodes, relationships, and properties can be created, edited, updated, and deleted from the Neo4j HTTP terminal.

Original title and link: Neo4j REST API Tutorial (NoSQL database©myNoSQL)


Different Graph Visualization Models: Graphs Beyond the Hairball

Networks are usually drawn using a technique called node-link diagrams. While that works well for small graphs (the technical name for networks), it breaks down beyond a few dozen nodes. […] For a while now, people in visualization have talked about the graph without the graph, i.e., graph visualization without the hairballs. Networks are clearly important and challenging data, and it seems a bit myopic to only look at node-link visualization. Node quilts and the PivotGraph represent promising steps into a very different direction.

These are some good answers to how to scale graph visualizations.

Original title and link: Different Graph Visualization Models: Graphs Beyond the Hairball (NoSQL database©myNoSQL)


NoSQL Databases Adoption in Numbers

Source of data is Jaspersoft NoSQL connectors downloads. RedMonk published a graphic and an analysis and Klint Finley followed up with job trends:

NoSQL databases adoption

Couple of things I don’t see mentioned in the RedMonk post:

  1. if and how data has been normalized based on each connector availability

    According to the post data has been collected between Jan.2011-Mar.2012 and I think that not all connectors have been available since the beginning of the period.

  2. if and how marketing pushes for each connectors have been weighed in

    Announcing the Hadoop connector at an event with 2000 attendees or the MongoDB connector at an event with 800 attendeed could definitely influence the results (nb: keep in mind that the largest number is less than 7000, thus 200-500 downloads triggered by such an event have a significant impact)

  3. Redis and VoltDB are mostly OLTP only databases

Original title and link: NoSQL Databases Adoption in Numbers (NoSQL database©myNoSQL)

Sones GraphDB Adds Data Visualization

An interesting addition for the upcoming sones GraphDB 2.1:

With the abil–ity to run queries and use plug-ins to deter–mine how the out–put will look like the Web–Shell is a per–fect place to enhance user expe–ri–ence. Since there are sev–eral out–put plug-ins avail–able with ver–sion 2.0 already (JSON, XML, Text, HTML,…) we thought it would be a great idea to have a sim–ple visu–al–iza–tion imple–mented just by adding a new out–put plug-in to GraphDB.

sones GraphDB data visualization

Original title and link: Sones GraphDB Adds Data Visualization (NoSQL database©myNoSQL)


Sones GraphDB Changes License for Libraries

If you check the quick review of existing graph databases and the NoSQL graph databases matrix you’ll notice that most of these came under either an AGPL license or a commercial one.

The game changed radically when Neo4j became available also under a GPL license. And now, Sones has changed the license of their GraphDB connectors to LGPL.

I’m no lawyer but I think this means you can use Sones GraphDB without having to open source your product even if commercial. And because the way you interact with Sones GraphDB is through its connectors it doesn’t matter anymore what the core graph database license is.

Original title and link: Sones GraphDB Changes License for Libraries (NoSQL database©myNoSQL)