NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



graph database: All content tagged as graph database in NoSQL databases and polyglot persistence

Graph Databases Power Marvel Universe's Social Network

During the presentation, Olson used the spandex-clad archer Hawkeye as an example. Throughout his career, the character has alternated between being a villain, a hero, and a covert operative. Additionally, other characters have assumed the mantle of Hawkeye at different points in time, while the man under the mask himself, Clint Barton, has adopted other identities as well.

Ignore it if you are not into comics. Of if you are a DC fan.

Original title and link: Graph Databases Power Marvel Universe’s Social Network (NoSQL database©myNoSQL)


Why relationships are cool… Relationship in RDBMS vs graph databases

I have to agree with Patrick Durusau on this:

I have been trying to avoid graph “intro” slides and presentations.

There are only so many times you can stand to hear “…all the world is a graph…” as though that’s news. To anyone.

This presentation by Luca is different from the usual introduction to graphs presentation.

Original title and link: Why relationships are cool… Relationship in RDBMS vs graph databases (NoSQL database©myNoSQL)

Updated conclusions about the graph database benchmark - Neo4j can perform much better

As I expected (and was quickly confirmed by a lot of people), the results in the graph database benchmark showing Neo4j being outperformed by MySQL, Vertica, VoltDB could have been much improved:

Our conclusions from this are that, like any of the complex systems we tested, properly tuning Neo4j can be tricky and getting optimal performance may require some experimentation with parameters. Whether a user of Neo4j can expect to see runtimes on graphs like this measured in milliseconds or seconds depends on workload characteristics (warm / cold cache) and whether setup steps can be amortized across many queries or not.

Looking at the 3 improvements mentioned in the post:

  1. Excluding connection. I think the change in the benchmark is actually about not accounting for the initialization of the database rather than timing connections. The performance of establishing connections is still pretty important. (check Mark Callaghan‘s posts about the work at Facebook to improve MySQL’s connections performance)
  2. Warm cache. A benchmark should measure both empty and warm caches behavior as these are two scenarios that any application will face.
  3. Simpler algorithm. This one is quite tricky. While the application should definitely take the approach that fits your database, it’s also a matter of knowledge and complexity. You could also think that the more different approaches you can use the better results you can get. Or vice-versa, the more approaches are possible the more time you’ll spend understanding which one to use, instead of getting things done (think Python vs. Perl).

Original title and link: Updated conclusions about the graph database benchmark - Neo4j can perform much better (NoSQL database©myNoSQL)


Benchmarking graph databases... with unexpected results

A team from MIT CSAIL set up to benchmark a graph database and 3 relational databases with different models: row-based (MySQL), in-memory (VoltDB), and column-based (Vertica) . The results are interesting, to say the least:

We can see that relational databases outperform Neo4j on PageRank by up to two orders of magnitude. This is because PageRank involves full scanning and joining of the nodes and edges table, something that relational databases are very good at doing. Finding Shortest Paths involves starting from a source node and successively exploring its outgoing edges, a very different access pattern from PageRank. Still, we see from Figure 1(b) that relational databases match or outperform Neo4j in most cases. In fact, Vertica is more than twice faster than Neo4j. The only exception is VoltDB over Twitter dataset.

Being beaten at your own game is not a good thing. I hope this is just a fluke in the benchmark (misconfiguration) or a result particular to those data sets.

Original title and link: Benchmarking graph databases… with unexpected results (NoSQL database©myNoSQL)


Purely awesome - Chess Games and Neo4j

I wasn’t able to follow the post. I got myself lost into the superb presentation built for it. Chess game replays. Dynamic graphs. Pure awesomeness.

This is by far the most entertaining blog entry presentation I’ve seen since I’ve start reading and writing about NoSQL.


Original title and link: Purely awesome - Chess Games and Neo4j (NoSQL database©myNoSQL)


On the topic of importing data into Neo4j

This post authored by Rik van Bruggen mentions the use of Talend ETL tool which brought an import job down from 1 hour to a couple of minutes:

This is where it got interesting. The spreadsheet import mechanism worked ok - but it really wasn’t great. It took more than an hour to get the dataset to load - so I had to look for alternatives. Thanks to my French friend and colleague Cédric, I bumped into the Talend ETL (Extract - Transform - Load) tools. I found out that there was a proper neo4j connector that was developed by Zenika, a French integrator that really seems to know their stuff.

There’s also a short video demoing Talend:

✚ I’ve mentioned what I see as the complexity of importing data into graph databases in On Importing Data into Neo4j

Original title and link: On the topic of importing data into Neo4j (NoSQL database©myNoSQL)


On Importing Data into Neo4j

For operations where massive amounts of data flow in or out of a Neo4j database, the interaction with the available APIs should be more considerate than with your usual, ad-hoc, local graph queries.

I’ll tell you the truth: when thinking about importing large amounts of data into a graph database I don’t feel very comfortable. And it’s not about the amount. It’s about the complexity of the data. Nodes. Properties of nodes. Relationships and their properties. And direction.

I hope this series started by Michael Hunger will help me learn more about graph database ETL.

Original title and link: On Importing Data into Neo4j (NoSQL database©myNoSQL)


Neo4j 1.9 General Availability - Auto-clustering, Cypher, and Some comments

The 1.9 release adds primarily three things:

  1. Auto-Clustering, which makes Neo4j Enterprise clustering more robust & easier to administer, with fewer moving parts
  2. Cypher language improvements make the language more functionally powerful and more performant, and
  3. New welcome pages make learning easier for new users
  1. The first is for the enterprise customers and brings in the features that were initially supported through ZooKeeper
  2. Cypher is Neo4j’s fast evolving query language
  3. The site is brilliant.
  4. The release post is terrible with no links to dive into the newly announced features.

Original title and link: Neo4j 1.9 General Availability - Auto-clustering, Cypher, and Some comments (NoSQL database©myNoSQL)


Neo4j Blog: Reloading my Beergraph - using an in-graph-alcohol-percentage-index

Rik Van Bruggen about data modeling in Neo4j:

One of the things that spurred the discussion was - probably not coincidentally - the AlcoholPercentage. Many people were expecting that to be a property of the Beerbrand - but instead in my beergraph, I had “pulled it out”. The main reason at the time was more coincidence than anything else, but when you think of it - it’s actually a fantastic thing to “pull things out” and normalise the data model much further than you probably would in a relational model. By making the alcoholpercentage a node of its own, it allowed me to do more interesting queries and pathfinding operations - which led to interesting beer recommendations. Which is what this is all about, right?

I can see where this is going, but I’m not sure I agree it’s the right approach. Basically in this case it works because the domain of the field is both discrete and small. Ideally, though, what you’d actually want is an index that could give you nodes that are “close-to-some value” (e.g.: “give me the beers in the 6.9-7.1 range”)

Original title and link: Neo4j Blog: Reloading my Beergraph - using an in-graph-alcohol-percentage-index (NoSQL database©myNoSQL)


Bootstrapping Neo4j With Spring-Data...without XML

The emphasis is on without XML:

With the maturing of Spring-Data I started porting all my personal projects to use Spring Data for bootstrapping.

Quite a bit of annotations needs, but I’d go with that instead of XML.

Original title and link: Bootstrapping Neo4j With Spring-Data…without XML (NoSQL database©myNoSQL)


A Quick Guide to Testing Spring Data Neo4j With NoSQLUnit

Alex Soto:

Spring Data Neo4j is the project within Spring Data project which provides an extension to the Spring programming model for writing applications that uses Neo4j as graph database. To write tests using NoSQLUnit for Spring Data Neo4j applications, you do need nothing special apart from considering that Spring Data Neo4j uses a special property called type in graph nodes and relationships which stores the fully qualified classname of that entity.

Is there a BigDataUnit framework? My only requirement is to use XML. Heavily.

Original title and link: A Quick Guide to Testing Spring Data Neo4j With NoSQLUnit (NoSQL database©myNoSQL)


Neo4j-Based Bitcoin Block Chain Visualizer

Pretty interesting usage of Neo4j for visualizing Bitcoin block chain:


Source code available on GitHub.

Original title and link: Neo4j-Based Bitcoin Block Chain Visualizer (NoSQL database©myNoSQL)