NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



graph database: All content tagged as graph database in NoSQL databases and polyglot persistence

Neo4j Data Modeling: What Question Do You Want to Answer?

Mark Needham:

Over the past few weeks I’ve been modelling ThoughtWorks project data in neo4j and I realised that the way that I’ve been doing this is by considering what question I want to answer and then building a graph to answer it.

This same principle should be applied to modeling with any NoSQL database. Thinking in terms of access patterns is one of the major differences between doing data modeling in the NoSQL space and the relational world, which is driven, at least in the first phases and theoretically, by the normalization rules.

Original title and link: Neo4j Data Modeling: What Question Do You Want to Answer? (NoSQL database©myNoSQL)


How to Import Large Graphs to Neo4j With Spring Data

In my case, I wanted to create a simple recommendation engine (the domain doesn’t matter so much). To do that, I had to import FAST 20 million nodes of one-to-many, sparse matrix data. This became a bit more complicated (and interesting) task than originally anticipated, so it became a mini-project itself.

Bulk insert is a scenario that every database should have it covered.

Original title and link: How to Import Large Graphs to Neo4j With Spring Data (NoSQL database©myNoSQL)


The Database Nirvana

Scroll to minute 16:55 of this video to watch Jim Webber explain the benefits of polyglot persistence and how starting (again) the winner-takes-it-all war is just sending us back at least 10 years from the database Nirvana.

We’ve just come from the place where one-size-fits-all and we don’t want to go back there. There is a huge wonderful ecosystem of stores. Pick the right one. Don’t just assume that the one you find the easiest or the one that shouts the loudest is the one you’re going to use. Pick the one that suits your data model.

It doesn’t matter what flavor of relational or NoSQL database you prefer or have experience with or if a small or large database vendor is paying your bills. You really need to get this right as otherwise we’re just going to destroy a lot of valuable options we’ve added to our toolboxes.

Original title and link: The Database Nirvana (NoSQL database©myNoSQL)

Neo4j REST API Tutorial

A detailed language agnostic intro to the Neo4j REST API:

In the above examples we have seen how nodes, relationships, and properties can be created, edited, updated, and deleted from the Neo4j HTTP terminal.

Original title and link: Neo4j REST API Tutorial (NoSQL database©myNoSQL)


Intro to Neo4j Cypher Query Language

Very good slidedeck from Max de Marzi introducing Neo4j’s Cypher query language. While you’ll have to go through the 50 slides yourself to get the details, I’ve extracted a couple of interesting bits:

  1. Cypher was created because Neo4j Java API was too verbose and Gremlin is too prescriptive
  2. SPARQL was designed for a different data model and doesn’t work very well with a graph database
  3. Cypher design decisions:
    • declarative
    • ASCII-art patterns (nb: when first sawing Cypher I haven’t thought of this, but it is cool)
    • pattern-matching
    • external DSL
    • closures
    • SQL familiarity (nb: as much as it’s possible with a radically different data model and processing model)

AlchemyDB: An Integrated GraphDB + RDBMS + KV Store + Document Store

I recently added a fairly feature rich Graph Database to AlchemyDB (called it LuaGraphDB) and it took roughly 10 days to prototype. I implemented the graph traversal logic in Lua (embedded in AlchemyDB) and used AlchemyDB’s RDBMS to index the data. The API for the GraphDB is modeled after the very advanced GraphDB Neo4j. Another recently added functionality in AlchemyDB, a column type that stores a Lua Table (called it LuaTable), led me to mix Lua-function-call-syntax into every part of SQL I could fit it into (effectively tacking on Document-Store functionality to AlchemyDB). Being able to call lua functions from any place in SQL and being able to call lua functions (that can call into the data-store) directly from the client, made building a GraphDB on top of AlchemyDB possible as a library, i.e. it didn’t require any new core functionality.

Two reasons for posting about it:

  1. relaxing contraints of the relational model can make a RDBMS partially adapt to other models (nb: this is just re-inforcing an old strategy used by)
  2. usually dismissed for the lack of portability in the RDBMS world, server side scripting support is an extremely powerful tool. Having processing close to data (i.e. data locality) is a well known advantage, but as shown in this post and in creating reliable queues with Redis Lua scripting it can open the doors to completely new features.

Original title and link: AlchemyDB: An Integrated GraphDB + RDBMS + KV Store + Document Store (NoSQL database©myNoSQL)


The Richness of the Graph Model: The Sky Is the Limit

Marco A. Rodriguez in Exploring Wikipedia with Gremlin Graph Traversals:

There are numerous ways in which Wikipedia can be represented as a graph. The articles and the href hyperlinks between them is one way. This type of graph is known a single-relational graph because all the edges have the same meaning — a hyperlink. A more complex rendering could represent the people discussed in the articles as “people-vertices” who know other “people-vertices” and that live in particular “city-vertices” and work for various “company-vertices” — so forth and so on until what emerges is a multi-relational concept graph. For the purpose of this post, a middle ground representation is used. The vertices are Wikipedia articles and Wikipedia categories. The edges are hyperlinks between articles as well as taxonomical relations amongst the categories.

Imagine the reachness of the model you’d achieve when every piece of data and metadata would become a vertex or an edge. It’s not just the wealth of data but also the connectivity. Time would be the only missing dimension.

Original title and link: The Richness of the Graph Model: The Sky Is the Limit (NoSQL database©myNoSQL)

NoSQL Hosting Services

Michael Hausenblas put together a list of hosted NoSQL solutions including Amazon DynamoDB and SimpleDB, Google App Engine, Riak, Cassandra, CouchDB, MongoDB, Neo4j, and OrientDB. If you go through my posts on NoSQL hosting , you’ll find a couple more.

Original title and link: NoSQL Hosting Services (NoSQL database©myNoSQL)


How Can Graphs Apply to IT Operations

Being it IT, devops, or no-ops, operations are a critical part of every fairly sized system with real, expressed or not, SLAs. John E. Vincent’s post is an interesting look at what he feels is missing to make system operations more managable:

What I feel like we’re missing is a way to express those relationships and then trigger on them all the way up and down the chain as needed. We’re starting to get into graph territory here.

We must we be able to express and act on changes at the micro level (I changed a config, I must restart nginx) and even at the intranode level (something changed in my app tier, need to tell my load balancer) but now we need a way handle it at that macro level. Not only do we need a way to handle it but we must also be able to calculate what is impacted by that change.

Original title and link: How Can Graphs Apply to IT Operations (NoSQL database©myNoSQL)


Graph Databases Updates: DEX Graph Database 4.5 and Neo4j 1.7 Milestone 1

Two new releases in the graph databases space:

DEX Graph Database 4.5

The new DEX Graph Database release comes with pre-packaged graph algorithms—breadth and depth first traversal, shortest path, Gabow connectivity—available for Java, .NET, and C++. You can get the new version from here.

Neo4j 1.7 Milestone 1

As per Neo4j 1.7 milestone 1 update, this version features:

  • improved Cypher
  • SSL support
  • improved Neo4j documentation
  • high availability improvements (nb: there are recommended maintenance releases for Neo4j 1.5 and 1.6)
  • upgraded Blueprints and Gremlin support

You can get Neo4j 1.7 from here.

Original title and link: Graph Databases Updates: DEX Graph Database 4.5 and Neo4j 1.7 Milestone 1 (NoSQL database©myNoSQL)

Using Graph Theory to Predict Basketball Teams Rankings

A directed network is simply a connection of nodes (representing teams) and arrows connecting teams called directed edges.  Every time a team defeated another, an arrow was drawn from the losing team’s node to the winning team’s node to represent this game.

Basketball and beers.

Original title and link: Using Graph Theory to Predict Basketball Teams Rankings (NoSQL database©myNoSQL)


NoSQL Paper: The Trinity Graph Engine

Even if my first post about the Micosoft research graph database Trinity is back from March last year, I haven’t heard much about it since. Based on my tip, Klint Finley published an interesting speculation about Trinity, Dryad, Probase, and Bing. Since then though, Microsoft moved away from using Dryad to Hadoop and I’m still not sure about the status of the Trinity project. But I have found a paper about the Trinity graph engine authored by Bin Shao, Haixun Wang, Yatao Li. You can read it or download it after the break.

We introduce Trinity, a memory-based distributed database and computation platform that supports online query processing and offline analytics on graphs. Trinity leverages graph access patterns in online and offline computation to optimize the use of main memory and communication in order to deliver the best performance. With Trinity, we can perform efficient graph analytics on web-scale, billion-node graphs using dozens of commodity machines, while existing platforms such as MapReduce and Pregel require hundreds of machines. In this paper, we analyze several typical and important graph applications, including search in a so- cial network, calculating Pagerank on a web graph, and sub-graph matching on web-scale graphs without using index, to demonstrate the strength of Trinity.