ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

graph database: All content tagged as graph database in NoSQL databases and polyglot persistence

Paper: Graph Based Statistical Analysis of Network Traffic

Published by a group from Los Alamos National Lab (Hristo Djidjev, Gary Sandine, Curtis Storlie, Scott Vander Wiel):

We propose a method for analyzing traffic data in large computer networks such as big enterprise networks or the Internet. Our approach combines graph theoretical representation of the data and graph analysis with novel statistical methods for discovering pattern and timerelated anomalies. We model the traffic as a graph and use temporal characteristics of the data in order to decompose it into subgraphs corresponding to individual sessions, whose characteristics are then analyzed using statistical methods. The goal of that analysis is to discover patterns in the network traffic data that might indicate intrusion activity or other malicious behavior.

The embedded PDF and download link after the break.


Time Lines and News Streams: Neo4j Is 377 Times Faster Than MySQL

In my use case neo4j outperformed MySQL by a factor of 377 ! That is more than two magnitudes). As known one part of my PhD thesis is to create a social newsstream application around my social networking site metalcon.de. It is very obvious that a graph structure for social newsstreams are very natural: You go to a user. Travers to all his friends or objects of interest and then traverse one step deeper to the newly created content items. A problem with this kind of application is the sorting by Time or relvance of the content items. But before I discuss those problems I just want to present another comparission between MySQL and neo4j.

This is wrong on so many levels. Scratch that. It’s even worse than an apples-to-oranges comparison.

Original title and link: Time Lines and News Streams: Neo4j Is 377 Times Faster Than MySQL (NoSQL database©myNoSQL)

via: http://www.rene-pickhardt.de/time-lines-and-news-streams-neo4j-is-377-times-faster-than-mysql/


Neo4j 1.4 “Kiruna Stol” Released With Many Notable Improvements

Releasing often has too many advantages to list them all, but I think the major ones are: capturing the interest of new users (generating buzz), showing a healthy project velocity, and, probably the most important one, delivering the features and improvements users were asking for in a timely manner . Neo4j has learned these lessons[1] and since Neo4j 1.2 the team at Neo Technologies is trying a very frequent release plan which also includes milestone releases. The other day, Neo4j 1.4, a.k.a. Kiruna Stol, has been released:

Over the last three months, we’ve released 6 milestones in our 1.4 series. Today we’re releasing the final Neo4j 1.4 General Availability (GA) package. We’ve seen a whole host of new features going into the product during this time, along with numerous performance and stability improvements. We think this is our best release yet, and we hope you like the direction in which the product is heading.

There are some notable new features and improvements in this release:

  1. a new query language called Cypher[2]
  2. automatic indexing
  3. a Lucene upgrade leading to faster indexing
  4. self relationships
  5. REST API improvements: exposing batch execution API, paging mechanism for traversers
  6. webadmin, performance, and new server management scripts

  1. In the NoSQL space, they are not alone. 10gen follows a similar aggressive release plan for MongoDB. Redis, even if supported by a 2 people team, has always enjoyed frequent releases. DataStax has also started to push out Cassandra updates more often.  

  2. At first glance the query language looks odd, but I haven’t looked beyond some basic examples to understand its syntax and strenght. Neo4j also supports Gremlin.  

Original title and link: Neo4j 1.4 “Kiruna Stol” Released With Many Notable Improvements (NoSQL database©myNoSQL)

via: http://blog.neo4j.org/2011/07/announcing-neo4j-14-kiruna-stol-ga.html


An Intro to Gremlin the Graph Traversal Language

A nice intro to Gremlin, the Groovy-based graph traversal language supporting Neo4j, OrientDB, DEX, RDF Sail, TinkerGraph, and ReXster:

Next thing you should do is take your favorite graph database and try out Gremlin.

Original title and link: An Intro to Gremlin the Graph Traversal Language (NoSQL database©myNoSQL)


Multi-Document Transactions in RavenDB vs Other NoSQL Databases

“We tried using NoSQL, but we are moving to Relational Databases because they are easier…”

This is how Oren Eini starts his post about RavenDB support for multi-document transactions and the lack of it from MongoDB:

  1. For a single server, we support atomic multi document writes natively. (note that this isn’t the case for Mongo even for a single server).
  2. For multiple servers, we strongly recommend that your sharding strategy will localize documents, meaning that the actual update is only happening on a single server.
  3. For multi server, multi document atomic updates, we rely on distributed transactions.

In the NoSQL space, there are a couple of other solutions that support transactions:

If you look at these from the perspective of distributed systems, the only distributed ones that support transactions are Megastore and RavenDB. There’s also VoltDB which is all transactions. Are there any I’ve left out?

Original title and link: Multi-Document Transactions in RavenDB vs Other NoSQL Databases (NoSQL database©myNoSQL)


Sones GraphDB Changes License for Libraries

If you check the quick review of existing graph databases and the NoSQL graph databases matrix you’ll notice that most of these came under either an AGPL license or a commercial one.

The game changed radically when Neo4j became available also under a GPL license. And now, Sones has changed the license of their GraphDB connectors to LGPL.

I’m no lawyer but I think this means you can use Sones GraphDB without having to open source your product even if commercial. And because the way you interact with Sones GraphDB is through its connectors it doesn’t matter anymore what the core graph database license is.

Original title and link: Sones GraphDB Changes License for Libraries (NoSQL database©myNoSQL)


Getting Started Spring Data Graph and Neo4j

Mark Pollack (VMWare) and Emil Eifrem (Neo Technology) answering the why and how to use Spring Data and Neo4j.


structr: CMS on top of Neo4j

structr :

structr is a free, open-source CMS under the GPLv3, written in Java, based on the fantastic NoSQL graph database Neo4j.

By design, structr is modular, distributed and easy to use.

structr is not yet stable, so please be patient and look out for bugs and minor (or even major) pitfalls.

If my memory serves me right, Neo4j started as a library used internally for building content management systems.

Patrick Durusau

Original title and link: structr: CMS on top of Neo4j (NoSQL databases © myNoSQL)


Graph Database Advantages over Document Databases

Jim Webber (Neo4j):

In these kind of situations, choosing a non-graph store for storing graphs is a gamble. You may find that you’ve designed your graph topology far too early in the system lifecycle and lose the ability to evolve the structure and perform business intelligence on your data. That’s why Neo4j is cool - it keeps graph and application concerns separate, and allows you to defer data modelling decisions to more responsible points throughout the lifetime of your application.

You can store single-step acyclic relationship in every kind of storage, but that doesn’t make it a graph. Nor a graph database.

Original title and link: Graph Database Advantages over Document Databases (NoSQL databases © myNoSQL)

via: http://jim.webber.name/2011/04/21/e2f48ace-7dba-4709-8600-f29da3491cb4.aspx


Emil Eifrem about Neo4j 1.3 and the Neo4j GPL Community Edition

Last week, Neo Technology has released the 1.3 version of their graph database Neo4j. The technical aspects of the release have been covered in this blog post. Briefly:

  • support for large data sets and optimizations at the storage level
  • improved web admin tool
  • API cleanup

But the most exciting aspect of Neo4j 1.3 is the availability of a GPL version of the graph database. Emil Eifrem has covered it here:

Today marks a new major milestone for Neo4j: we’re making the core graph database - Neo4j Community - available under the same proven open source license as MySQL, the GNU General Public License (GPL).

That means that in every scenario where you can use MySQL for free, you can now also use Neo4j Community for free.

I had the chance to talk to Emil and he has been kind enough to answer my questions.

Alex: It took Neo Technology almost 10 years to release Neo4j 1.0. Since then things seem to have moved faster and faster. What changed leading to this fast paced release cycles?

Emil: The main reason is that our community has just reached a critical mass. This means that the feedback loop is faster, feature requests are more frequent, bug fixes and patches are better. It’s a faster and more virtuous cycle. On top of that, our customer traction the past year has allowed us to grow the full time in-house development team.

Alex: How would you summarize the release of Neo4j 1.3?

Emil: By far the most important aspect of this release is the license change to the GPL for Neo4j Community. Secondly, I’d put the support for really large stores (100+ billion of primitives). And finally, I’d love to give a shout out to the new interactive graph visualization in the web UI.

Neo4j web admin

Alex:  3 products and 3 licenses. Moreover Neo4j Community edition comes with a GPLv3 license. As you know I’ve always said that graph databases market is missing a more open license. So what made you change your mind about the licensing model?

Emil: The GPL is the best license for getting Neo4j in the hands of developers worldwide. It’s a proven model to get databases in the hands of developers while protecting an OEM revenue stream, so we figured why reinvent the wheel? The world deserves a graph database under the GPL.

Alex: Could you please clarify a bit the differences between the 3 products and their licensing models?

Emil: Sure, Neo4j Community is what most people will use. It’s a fully functional, robust and mature graph database. It’s available under the GPL like MySQL, which means that it can be used for free in all “end user” scenarios (for example to back a webapp). For OEM scenarios (i.e. it’s embedded in a product that ships to end users) then the enclosing product must be open source.

Neo4j Advanced adds monitoring and management and couples that with commercial support. It’s available under the AGPL or a commercial license.

Neo4j Enterprise adds high availability, i.e. the ability to automatically and transparently replicate the graph across many instances, and enterprise-grade 24/7 commercial support. It’s available under the AGPL or a commercial license.

Alex: In your post you are saying that “the graph database opportunity is at least as big as the MySQL opportunity”. Could you please expand on this?

Emil: Absolutely. First off, information is exploding in both volume and complexity and in many cases relational databases can’t keep up. For example, a lot of big installations have massive problems with low-latency queries due to joins.

Secondly, business requirements are changing. For example, we have high requirements on the freshness of information (“realtime”) where a retail store may want to get a coupon recommendation while the customer is still in the store, not 24 hour later from the big corporate data warehouse.

Some of the largest web properties in the world were hit early by these two forces, and this catalyzed NoSQL. Now ask yourself this: of these two trends (information volume / complexity and realtime business requirements), in which direction is the world moving? I think the answer is clear and over time, most database deployments in the world will face requirements similar to the high-end web properties of today. In order to deliver business value, IT departments must then be equally committed to SQL and NoSQL.

I think of the current NoSQL landscape graph databases have the opportunity to solve the most problems, for most developers, in most situations. A graph database is incredibly horizontally applicable and it’s useful across a wide range of problem spaces. In a world where most applications make use of both SQL and NOSQL, graph databases have the opportunity to be as frequently used as MySQL is today.

That’s why I said that the graph database opportunity is at least as big as the MySQL opportunity.

Alex: Could you enumerate some not so common use cases for Neo4j?

Emil: No! If they’re not so common I probably don’t know them. But here are three relatively unknown use cases for graph databases:

  • Cloud Management: Neo4j is used today to back management and operations on some of the largest private cloud deployments in the world.
  • Network Management: In the telecom and datacom world, management of resources in networks has long been a huge problem. It lends itself incredibly well to graph modeling.
  • Master Data Management (MDM): This is a very enterprise-y use case, but relevant for all big companies in the world. MDM stores the master data for a big company and that data is usually very complex and dynamic and gives huge join-problems if you put it in a relational database. That kind of dataset is a great fit for a graph database.

Alex: I confess that I was expecting to see a more open license available in the graph databases market. So I’m happy to see this happening. Also I’m convinced that it is a very smart move for both the future of graph databases and your company. Thanks a lot Emil.

Original title and link: Emil Eifrem about Neo4j 1.3 and the Neo4j GPL Community Edition (NoSQL databases © myNoSQL)


Graph Databases: Distributed Traversal Engines

Marko A.Rodriguez:

In the distributed traversal engine model, a traversal is represented as a flow of messages between elements of the graph. Generally, each element (e.g. vertex) is operating independently of the other elements. Each element is seen as its own processor with its own (usually homogenous) program to execute. Elements communicate with each other via message passing. When no more messages have been passed, the traversal is complete and the results of the traversal are typically represented as a distributed data structure over the elements. Graph databases of this nature tend to use the Bulk Synchronous Parallel model of distributed computing. Each step is synchronized in a manner analogous to a clock cycle in hardware. Instances of this model include Agrapa, Pregel, Trinity, GoldenOrb, and others.

None of these graph databases offers distributed traversal engines.

Original title and link: Graph Databases: Distributed Traversal Engine (NoSQL databases © myNoSQL)

via: http://markorodriguez.com/2011/04/19/local-and-distributed-traversal-engines/


Cloud Foundry, NoSQL Databases, and Polyglot Persistence

VMWare’s Cloud Foundry has the potential to become the preferred PaaS solution. It bundles together a set of services that it took years for other PaaS providers (Google App Engine, Microsoft Azure) to offer. And it seems that Cloud Foundry has much less (or none at all) vendor lock in[1].

From a storage perspective, Cloud Foundry is encouraging polyglot persistence right from the start offering access to a relational database (MySQL), a super-fast smart key-value store (Redis), and a popular document database (MongoDB). The only bit missing is a graph database[2].

I think the first graph database to get there will see an immediate bump in its adoption.


  1. These comments are based on what I’ve read about VMWare CloudFoundry as I haven’t received (yet) my invitation.  

  2. I don’t think wide-column databases (Cassandra, HBase) are fit for PaaS  

Original title and link: Cloud Foundry, NoSQL Databases, and Polyglot Persistence (NoSQL databases © myNoSQL)