graph database: All content on NoSQL databases and projects about graph database, featuring the best daily NoSQL articles, news, and links on graph database

Exploring Neo4j, the NoSQL Graph Database

by Alex Popescu

Twitter Reddit

Rahul Sharma takes a look at Neo4j and some basic operations with graph databases:

Let us say we want to implement a use-case where there are persons and a person can be connected to other persons. In order to use Neo4J we must think about POJOs in terms of interfaces and corresponding implementions. This is so because the database is a key-value store at the back, so it asks us to store the properties of the POJO in terms of key-value pairs. Moreover there are no foreign keys in Neo4J, objects in the db are connected with other objects using Relationships.

Interestingly, he mentions getting some errors when trying to push 151K names. Sounds like he could use this Neo4j tip for handling long transactions.

Original title and link for this post: Exploring Neo4j, the NoSQL Graph Database (published on the NoSQL blog: myNoSQL)


InfoGrid: Graph Database Schema

by Alex Popescu

Twitter Reddit

While most of the graph databases can be seen as collections of vertices and edges[1] carrying bags of properties, InfoGrid thinks that having some sort of a schema is good for both data integrity, code simplicity, and documentation purposes:

InfoGrid distinguishes between properties that must have a non-null value, and properties that may or or may not be null.

[…]

If InfoGrid did not distinguish between required and optional values, application code would be littered with unnecessary tests for null values. (or failing that, unexpected NullPointerExceptions.) We think being specific is better when creating the model; higher-quality and less cluttered application code is the reward.

InfoGrid is not alone, as we’ve seen a similar approach in the Java Content Repository node type definitions and also in OrientDB schema-less, schema-full, and mixed schema.

So what other creators think about schemas on graph databases?


  1. Check Marko A. Rodrigues and Peter Neubauer’s paper: Constructions from dots and lines  ()

Original title and link for this post: InfoGrid: Graph Database Schema (published on the NoSQL blog: myNoSQL)


InfiniteGraph Use Case: Modeling Stackoverflow

by Alex Popescu

Twitter Reddit
1 likes

I didn’t hear much about InfiniteGraph after its 1.0 release, except this post that uses Stackoverflow data as input to demo some features of graph databases:

The vertices in the graph are represented as the Users, Questions and Answers above while the edges are represented as the interactions between them (i.e. a User “Posts” a Question, an Answer is “For” a Question, a User “Comments On” a Question or Answer). Simple enough, and like most other social graphs, users seem to be the focal points with the majority of connected edges. Now all I needed was a sample application that could construct the graph data model from the XML sources and run some queries.

Original title and link for this post: InfiniteGraph Use Case: Modeling Stackoverflow (published on the NoSQL blog: myNoSQL)


What is HyperGraphDB?

by Alex Popescu

Twitter Reddit

Recently we’ve seen a lot of activity in the graph database world. Better understanding the space will help us make smarter decisions, so I’ve decided to reach out to the main players in the market and run a series of interviews about their projects and goals. The first in this series is about HyperGraphDB and Borislav Iordanov, his creator, has been kind enough to answer my questions.

myNoSQL: What is HyperGraphDB?

Borislav Iordanov: HyperGraphDB is a storage framework based on generalized hypergraphs as its underlying data model. The unit of storage is a tuple made up of 0 or more other tuples. Each such tuple is called an atom. One could think of the data model as relational where higher-order, n-ary relationships are allowed or as graph-oriented where edges can point to an arbitrary set of nodes and other edges. Each atom has an arbitrary, strongly-typed value associated with it. The type system managing those values is embedded as a hypergraph and customizable from the ground up. HyperGraphDB itself is an embedded database with an XMPP-based distribution framework and it relies on a key-value store underneath, currently BerkeleyDB. In its present form, it is a full-fledged object-oriented database for Java as well. Storage layout, indexing and caching are designed to support graph traversals and pattern matching.

myNoSQL: How would you position HyperGraphDB inside the NoSQL space?

Boris: I think it is quite apart and I don’t see it fit into any particular category. Because of the term “hypergraph”, it’s been categorized as a “graph database”, but strictly speaking it is not. The focus is highly complex data and knowledge representation problems. It originated from an AI project (http://www.opencog.org) and its power is partly in its data model and in its open-architecture framework.

myNoSQL: Would you mind explaining a bit more why you are placing HyperGraphDB closer to object databases than to graph databases?

Boris: Probably because object structures have the same kind of generality — arbitrary nesting, n-ary relations and if you model a relation as an identifiable object, it’s in effect reified, so you can have higher-order relationships. In addition OO database have well-developed type systems, as HyperGraphDB does (but HyperGraphDB’s is more general because you could model functional style type systems in it, you could also have types of types of types etc. ad infinitum).

Standard graphs are really just one kind of data structure that is conceptually simple and that happens to be very well studied mathematically so people use them a lot in modeling. A graph database is probably very good at dealing with graph-oriented problems with large datasets, but for general programming one would want a more versatile data model, and HyperGraphDB offers that as well as OO databases.

Obviously, one could model object structures as well as hypergraphs with classical graphs, but that doesn’t mean much - one could translate a C program into a Turing machine, and this doesn’t make the Turing machine a good choice for the problem the C program is solving.

myNoSQL: What are other solutions in this category/space?

Boris: I don’t know of any. The topic maps formalism (an RDF rival, that sadly is not very popular) is very close to the HyperGraphDB data model. RDF itself, named graphs etc. are close. Then graph and OO databases obviously touch on some of the functionality, with OO databases probably being closer. The database behind freebase.com is very similar in architecture, but relations are with fixed arity there too.

myNoSQL: Could you identify a couple of unique features that are differentiating HyperGraphDB from the other solutions?

Boris: Probably the two most interesting ones are:

  1. Higher-order, n-ary relations are unique to HyperGraphDB
  2. Open-architecture: there’s a very strong “frameworky” aspect to HyperGraphDB, it’s not a black box with fixed, restrictive data-model. The storage layout is open and documented. One can plugin customized indexing, customized type handling, customized back-end storage, customized distribution algorithms etc.

myNoSQL: What’s coming next on HyperGraphDB’ roadmap and why?

Boris The next release will be 1.1 within the next month or so, containing many bug fixes and polishing of the APIs. In addition, it will contain an MVCC implementation to increase transaction throughput, out-of-the-box replication, some optimizations for querying and graph traversals.

Following that, we will be focusing on developing a query language geared towards HyperGraphDB’s unique data model and developing more distribution algorithms for truly massive scalability.

People have also asked about full-text search so an integration with Lucene might happen some time within the next couple of months.

Nested graphs, RAM only graphs and a C++ port are also desirable features on our radar, as time and resources allow. We are an open-source, LGPL project and it all depends on how many people are willing to contribute and how much time they are willing to put in, so no definite dates yet.

myNoSQL: Thanks a lot Boris!

What is HyperGraphDB? originally posted on the NoSQL blog: myNoSQL


sones GraphDB available on Microsoft Windows Azure

by Alex Popescu

Twitter Reddit

sones GraphDB available in the Microsoft cloud:

The sones GraphDB is the first graph database which is available on Microsoft Windows Azure. Since the sones GraphDB is written in C# and based upon Microsoft .NET it can run as an Azure Service in it’s natural environment. No Wrapping, no glue-code. It’s the performance and scalability a customer can get from a on-premise hosted solution paired with the elasticity of a cloud platform.

You can read a bit more about it ☞ here.

In case you’ve picked other graph database, you can probably set it up with one of the cloud providing Infrastructure-as-a-Service.

sones GraphDB available on Microsoft Windows Azure originally posted on the NoSQL blog: myNoSQL


Gephi: Visualization Library for Graph Databases

by Alex Popescu

Twitter Reddit

You probably know by now that I love visualization tools:

Get the version of Gephi app that can read neo4j databases bzr branch http://bazaar.launchpad.net/~bujacik/gephi/support-for-neo4j:

Gephi and Neo4j

InfiniteGraph Graph Database Reaches 1.0 Release

by Alex Popescu

Twitter Reddit

Firstly announced just a bit over a month ago, InfiniteGraph, the graph database from Objectivity, has already reached the 1.0 release. At this time I don’t have yet the details of these release.

InfiniteGraph offers a 2-month free version to developers requiring afterwards a $999/year license. According to this comparison of NoSQL graph databases, I cannot say that’s the most “generous” offer in the graph database market.


Transport Route Planner Using Neo4j

by Alex Popescu

Twitter Reddit

TransportDublin.ie:

It is combines Neo4j , Google Maps API v3 , Spring 3.0 MVC-AJAX with JQuery and Javascript parsed JSON for the presentation layer.


Neo4j Tips & Tricks: Handling Long Transactions

by Alex Popescu

Twitter Reddit

An answer to the question: is write performance influenced by the size of transactions? (nb the “popular” question though is: why does my write performance drops off when performing many operations in a single transaction?):

The reason is because Neo4j keeps the transaction’s operations in memory until commit, so your JVM will eventually run out of memory and start paging to disk.

There are two solutions:

  1. split your transactions into groups of 30,000 or so (obviously you give up the ability to do a full rollback)
  2. skip the transaction part and use the BatchInserter, which writes directly to the persistence layer rather than keeping everything in memory.

Comparing Pregel and MapReduce

by Alex Popescu

Twitter Reddit

Following his post on graph processing, Ricky Ho explains the major difference between Pregel and MapReduce applied to graph processing:

Since Pregel model retain worker state (the same worker is responsible for the same set of nodes) across iteration, the graph can be loaded in memory once and reuse across iterations. This will reduce I/O overhead as there is no need to read and write to disk at each iteration. For fault resilience, there will be a periodic check point where every worker write their in-memory state to disk.

Also, Pregel (with its stateful characteristic), only send local computed result (but not the graph structure) over the network, which implies the minimal bandwidth consumption.

If you need to summarize that even further it is basically:

  • reducing I/O as much as possible
  • ensuring data locality

Video: Emil Eifrem about NoSQL and the Benefits of Graph Databases

by Alex Popescu

Twitter Reddit

InfoQ[1] style!


  1. The presentation is great, but as a disclaimer please keep in mind I’m the co-founder of InfoQ.com. I also have a hint: InfoQ just added the possibility to watch presentations in both vertical and horizontal mode. Hope you’ll like it!  ()

On Graph Processing

by Alex Popescu

Twitter Reddit

Ricky Ho explains these two fundamental graph papers

The execution model is based on BSP (Bulk Synchronous Processing) model. In this model, there are multiple processing units proceeding in parallel in a sequence of “supersteps”. Within each “superstep”, each processing units first receive all messages delivered to them from the preceding “superstep”, and then manipulate their local data and may queue up the message that it intends to send to other processing units. This happens asynchronously and simultaneously among all processing units. The queued up message will be delivered to the destined processing units but won’t be seen until the next “superstep”. When all the processing unit finishes the message delivery (hence the synchronization point), the next superstep can be started, and the cycle repeats until the termination condition has been reached.

Pregel execution model

Note that Google’s Pregel is at the very high level quite similar to Google’s MapReduce.