NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



graph database: All content tagged as graph database in NoSQL databases and polyglot persistence

Neo4j and D3.js: Visualizing Connections Over Time

Another great graph data visualization using Neo4j and D3.js from Max De Marzi:

Graph data visualization of connections over time

  • Max de Marzi is lately my favorite source for graph data visualization posts
  • Even if the diagram looks amazing I’m wondering if it would scale for larger data sets
  • Even if I gave it some thought, I’m still not sure how graph databases can record historical relationship/the evolution of relationships in a graph. If you have any ideas I’d love to hear.

Original title and link: Neo4j and D3.js: Visualizing Connections Over Time (NoSQL database©myNoSQL)

Insolvent Sones GraphDB Available for Sale

An article in a German publication mentions (according to Google translator) that sones GraphDB is up for sale:

The administrator of sones GmbH Hartig, Dr. Oliver lawyer, said that the graph database of insolvent sones GmbH will be sold.

Anyone interested?

Original title and link: Insolvent Sones GraphDB Available for Sale (NoSQL database©myNoSQL)

Beer Recommendations With Graph Databases

Josh Adell explains how to extend a simple recommendation engine to similarity-based collaborative filtering:

Instead of basing recommendations off of one similar rating, I can calculate how similarly you and I rated all the things we have rated, and only get recommendations from you if I have determined we are similar enough in our tastes.

This is much closer to how recommendation engines developed by sites like Amazon or Netflix are working.

Original title and link: Beer Recommendations With Graph Databases (NoSQL database©myNoSQL)


Gremlin vs Cypher

Romiko Derbynew comparing Gremlin and Neo4j Cypher:

  • Simple graph traversals are much more efficient when using Gremlin
  • Queries in Gremlin are 30-50% faster for simple traversals
  • Cypher is ideal for complex traversals where back tracking is required
  • Cypher is our choice of query language for reporting
  • Gremlin is our choice of query language for simple traversals where projections are not required
  • Cypher has intrinsic table projection model, where Gremlins table projection model relies on AS steps which can be cumbersome when backtracking e.g. Back(), As() and _CopySplit, where cypher is just comma separated matches
  • Cypher is much better suited for outer joins than Gremlin, to achieve similar results in gremlin requires parallel querying with CopySplit, where as in Cypher using the Match clause with optional relationships
  • Gremlin is ideal when you need to retrieve very simple data structures
  • Table projection in gremlin can be very powerful, however outer joins can be very verbose

So in a nutshell, we like to use Cypher when we need tabular data back from Neo4j and is especially useful in outer joins.

Patrick Durusau

Original title and link: Gremlin vs Cypher (NoSQL database©myNoSQL)


InfiniteGraph 2.1 Features Gremlin Support and a Plugin Framework

A new version of InfiniteGraph, the graph database from Objectivity, was announced today. This release features:

  • a plugin framework: Two kinds of plugins are supported. A navigator plugin bundles components that assist in navigation queries, such as result qualifiers, path qualifiers, and guides. The Formatter plugin formats and outputs results of graph queries.
  • enhanced IG Visualizer: The advanced Visualizer is now tightly integrated with InfiniteGraph’s Plugin Framework allowing indexing queries for edges, the Formatter plugin framework export GraphML and JSON (built-in) or other user defined plugin formats.
  • support for Tinkerpop Blueprints and Gremlin: InfiniteGraph provides a clean integration with Blueprints that is well suited for applications that want to traverse and query graph databases using Gremlin

A bit more details can be found in the InfiniteGraph 2.1 release notes.

Klint Finley

Original title and link: InfiniteGraph 2.1 Features Gremlin Support and a Plugin Framework (NoSQL database©myNoSQL)

What types of applications might a graph database be well suited for?

Found this list of use cases for graph databases in a follow up of a Neo4j webinar:

  • Social networks
  • Collaboration programs
  • Configuration Management
  • Geo-Spatial applications
  • Impact Analysis
  • Master Data Management
  • Network Management
  • Product Line Management
  • Recommendation Engines

The more generic answer would be that graph databases can be a great fit for problems handling highly connected data.

The examples above are clear cases of use cases involving highly connected data , but as of now I’m not aware of any social networks, network management, or large scale recommendation engines built on top of one of the existing graph databases.

Original title and link: What types of applications might a graph database be well suited for? (NoSQL database©myNoSQL)

Calculating a Graph's Degree Distribution Using R MapReduce over Hadoop

Marko Rodriguez is experimenting with R on Hadoop and one of his exercises is calculating a graph’s degree distribution. I confess I had to use Wikipedia for reminding what’s the definition of a node degree:

  1. The degree of a node in a network (sometimes referred to incorrectly as the connectivity) is the number of connections or edges the node has to other nodes. The degree distribution P(k) of a network is then defined to be the fraction of nodes in the network with degree k.
  2. The degree distribution is very important in studying both real networks, such as the Internet and social networks, and theoretical networks.

As an imagination exercise think of a graph database that’s actively maintaining an internal degree distribution and uses it to suggest or partition the graph. Would that work?

Original title and link: Calculating a Graph’s Degree Distribution Using R MapReduce over Hadoop (NoSQL database©myNoSQL)


Friend Recommendations Using Gremlin With Neography

Max De Marzi:

Gremlin is a domain specific language for traversing property graphs. Neo4j is one of the databases that can speak the gremlin language, and as promised I’ll show you how you can use it to implement friend recommendations as well as degrees of separation.

Original title and link: Friend Recommendations Using Gremlin With Neography (NoSQL database©myNoSQL)


Neuron Based Data Structure – an Implementation

Alexander Bresk:

The neuron based data structure (called NBDS) follows the idea, to keep an information as an atomic part. The model contains three parts. The first part is the Neuron, which acts like a container for data. The second part is the Axon. This axon connects two neurons together and it can still contain information (data about the connection or relation). The last part is the Space. In a Space you put neurons and axons together and run some operations on it. You can imagine the space as a component, that brings the order into the set of neurons and axons.

You’ll find all these features in any graph database.

Original title and link: Neuron Based Data Structure – an Implementation (NoSQL database©myNoSQL)


Forrester Predictions for 2012: Hadoop, In-Memory Analytics Platforms, Graph Databases

James Kobielus summarizes Forrester’s predictions for 2012:

Enterprise Hadoop deployments will expand at a rapid clip.


In-memory analytics platforms will grow their footprint.

Assuming they are referring to products like SAP Hana, Tibco Spotfire BI, etc., my bet is that their adoption will depend heavily on their integration with Big Data toolkits.

Soon I also expect to see some in-memory data-grid products slightly shifting their direction and trying to penetrate the analytics market.

Graph databases will come into vogue: The market for graph databases will boom in 2012 as companies everywhere adopt them for social media analytics, marketing campaign optimization, and customer experience fine-tuning.

I know someone that will be very happy to read this prediction.

While I do agree this will happen, I also think that some more technical and communication advances in this space are needed before seeing a wide adoption of graph databases.

Original title and link: Forrester Predictions for 2012: Hadoop, In-Memory Analytics Platforms, Graph Databases (NoSQL database©myNoSQL)

Persistent Graph Structures With Ruby/Rails

Summarizing this long thread trying to answer the question in the title: Neo4j + JRuby.

Original title and link: Persistent Graph Structures With Ruby/Rails (NoSQL database©myNoSQL)

NoSQL Databases Best Practices and Emerging Trends

Jans Aasman (CEO AllegroGraph) interviewed by Srini Penchikala:

InfoQ: What best practices and architecture patterns should the developers and architects consider when using a solution like this one in their software applications?

Jans: If your application requires simple straight joins and your schema hardly changes then any RDBM will do.

If your application is mostly document based, where a document can be looked at as a pre-joined nested tree (think a Facebook page, think a nested JSON object) and where you don’t want to be limited by an RDB schema then key-value stores and document stores like MongoDB are a good alternative.

If you want what is described in the previous paragraph but you have to perform complex joins or apply graph algorithms then the MongoGraph approach might be a viable solution.

Thinking about the products and projects I’ve been working on, most of them have had to deal with all these aspects in different areas of the applications and with different importance to the final solution. Mistakenly though, in most of the cases they ended up using a relational database only. With polyglot persistence here, this shouldn’t happen anymore. That’s not to say though that every project must use all of these technologies just because they are available. But it could use any of them or all combined.

InfoQ: What are the emerging trends in combining the NoSQL data stores?

Jans: From the perspective of a Semantic Web - Graph database vendor what we see is that nearly all graph databases now perform their text indexing with Lucene based indexing (Solr or Elastic Search) and I wouldn’t be surprised that most vendors soon will allow JSON objects as first class objects for graph databases. It was surprisingly straightforward to mix the JSON and triple/graph paradigm. We are also experimenting with key-value stores to see how that mixes with the triple/graph paradigm.

This topic was also discussed during my NoSQL Applications panel, but due to a panel time constraints we couldn’t reach a conclusion. But it’s definitely an interesting perspective.

Original title and link: NoSQL Databases Best Practices and Emerging Trends (NoSQL database©myNoSQL)