NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



graph database: All content tagged as graph database in NoSQL databases and polyglot persistence

Forrester Predictions for 2012: Hadoop, In-Memory Analytics Platforms, Graph Databases

James Kobielus summarizes Forrester’s predictions for 2012:

Enterprise Hadoop deployments will expand at a rapid clip.


In-memory analytics platforms will grow their footprint.

Assuming they are referring to products like SAP Hana, Tibco Spotfire BI, etc., my bet is that their adoption will depend heavily on their integration with Big Data toolkits.

Soon I also expect to see some in-memory data-grid products slightly shifting their direction and trying to penetrate the analytics market.

Graph databases will come into vogue: The market for graph databases will boom in 2012 as companies everywhere adopt them for social media analytics, marketing campaign optimization, and customer experience fine-tuning.

I know someone that will be very happy to read this prediction.

While I do agree this will happen, I also think that some more technical and communication advances in this space are needed before seeing a wide adoption of graph databases.

Original title and link: Forrester Predictions for 2012: Hadoop, In-Memory Analytics Platforms, Graph Databases (NoSQL database©myNoSQL)

Persistent Graph Structures With Ruby/Rails

Summarizing this long thread trying to answer the question in the title: Neo4j + JRuby.

Original title and link: Persistent Graph Structures With Ruby/Rails (NoSQL database©myNoSQL)

NoSQL Databases Best Practices and Emerging Trends

Jans Aasman (CEO AllegroGraph) interviewed by Srini Penchikala:

InfoQ: What best practices and architecture patterns should the developers and architects consider when using a solution like this one in their software applications?

Jans: If your application requires simple straight joins and your schema hardly changes then any RDBM will do.

If your application is mostly document based, where a document can be looked at as a pre-joined nested tree (think a Facebook page, think a nested JSON object) and where you don’t want to be limited by an RDB schema then key-value stores and document stores like MongoDB are a good alternative.

If you want what is described in the previous paragraph but you have to perform complex joins or apply graph algorithms then the MongoGraph approach might be a viable solution.

Thinking about the products and projects I’ve been working on, most of them have had to deal with all these aspects in different areas of the applications and with different importance to the final solution. Mistakenly though, in most of the cases they ended up using a relational database only. With polyglot persistence here, this shouldn’t happen anymore. That’s not to say though that every project must use all of these technologies just because they are available. But it could use any of them or all combined.

InfoQ: What are the emerging trends in combining the NoSQL data stores?

Jans: From the perspective of a Semantic Web - Graph database vendor what we see is that nearly all graph databases now perform their text indexing with Lucene based indexing (Solr or Elastic Search) and I wouldn’t be surprised that most vendors soon will allow JSON objects as first class objects for graph databases. It was surprisingly straightforward to mix the JSON and triple/graph paradigm. We are also experimenting with key-value stores to see how that mixes with the triple/graph paradigm.

This topic was also discussed during my NoSQL Applications panel, but due to a panel time constraints we couldn’t reach a conclusion. But it’s definitely an interesting perspective.

Original title and link: NoSQL Databases Best Practices and Emerging Trends (NoSQL database©myNoSQL)


Graph Database Apps Ideas: The InfiniteGraph Contest's Winners

Two of the three winning projects of the InfiniteGraph competition look like really interesting solutions for graph-oriented problems:

  • “InfiniteCommits”, developed by William Cheung, allows users of GitHub to quickly obtain useful information that isn’t currently available through the GitHub web interface.  This application uses Play (a rapid Java and Scala web development framework) to generate a report of the most active files in a GitHub repository, while using InfiniteGraph’s Data Visualizer to see the latest changes and notes associated with each change.

  • “Call Graph Analysis”, developed by Vimal Kumar, analyzes “call graphs” extracted from large code-bases.  Vimal’s application uses InfiniteGraph to explore millions or billions of lines of code, and then quickly find all the connections and relationships between countless functions contained within the code.  This gives developers the ability to quickly understand and visualize the most complex structures and interactions among different software modules, which is incredibly useful not only in their work, but for training new code contributors, or as a visual aid for debugging and testing purposes.

Congrats to the winners and InfiniteGraph for organizing the contest!

It would have been even greater if these applications or the InfiniteGraph-related parts would have been open sourced to serve as learning materials for graph database newbies.

Original title and link: Graph Database Apps Ideas: The InfiniteGraph Contest’s Winners (NoSQL database©myNoSQL)


Graph Databases and the World Wide Web

Sir Tim Berners-Lee:

Inventing the World Wide Web involved my growing realization that there was a power in arranging ideas in an unconstrained, web-like way.  And that awareness came to me through precisely that kind of process.

Let’s think how the different data models require us to arrange data:

  1. hierarchical model: free form, single-type of relationship (parent-child)
  2. relational model: strict form, (limited) multiple-types of relationships
  3. document model: free form, dual relationship types: logical and hierarchical
  4. star schema: strict form, (limited) multiple-types of relationships

Now think about graph databases: free form (nodes can have any number of properties), unlimited number of uni/bi-directional relationships. So question is, why aren’t network/graph databases used more these days?

Original title and link: Graph Databases and the World Wide Web (NoSQL database©myNoSQL)

Two Important Events in the NoSQL World

I’m starting to catch up with the news after my sabatical month and it turns out things didn’t stay still during this period. While there are quite a few very important things that have happened during October, I’d like to bring up two very interesting ones that mark a possible turn in the NoSQL databases world.

  1. The first insolvency/bankruptcy in the market.

    Based on a tweet from Achim Friedland, ex-development lead CTO at sones, the German graph database sones GmbH, which raised back in February another round of funding, was declared insolvent.

    This is an unfortunate validation of my thoughts about Graph Databases market penetration. sones GmbH has never been a market leader, but they could have tried to focus on a niche segment of the graph database emerging market and while that wouldn’t necessarily transform the company in a huge success, it would have probably gave it more time to refine the product and expand.

    Update: Daniel Kirstenpfad (CTO, sones GmbH) reached out to me with some clarifications:

    1. Achim Friedland was at a point in time the development lead of sones and in that position responsible for leading the developer team. He never was CTO of sones.

    2. sones is not insolvent but rather is under preliminary bankrupty administration with the goal to arrive at a solution for continuation of product and company

  2. I’m starting to notice a shift in the (marketing) message of a couple of NoSQL companies towards Enterprise NoSQL. I’m not yet sure what enterprise NoSQL means though: targeting enterprise customers, large scale NoSQL deployments, expensive NoSQL product and services packages, etc..

    Whatever this terms means, I take it as a sign of: a) the market becoming too busy; b) growing competition for paying customers ; c) investors looking for clear validations of their investments.

    What I hope this does not mean is the start of the unhealthy, unfriendly, and dirty competition. This market segment has greatly benefitted from a friendly environment in which all contenders have been pushing their products forward while working together to popularize and bring awareness to the polyglot persistence philosophy.

Original title and link: Two Important Events in the NoSQL World (NoSQL database©myNoSQL)

What Are Some Good MapReduce Implementations for Graphs?

In case you were wondering how some problems Hadoop and MapReduce are not best at solving, there’s a great Q&A on

MapReduce is good at distributed computing, but not for graph algorithms. Is there a general-use, highly-distributed open source graph framework? I’m especially interested in hearing about in-practice use cases, and how good/bad they were.

Ankur Dave’s answer is quite compehensive, listing 5 specialized solutions and 3 generic frameworks:

  • Giraph
  • GraphLab
  • Phoebus
  • Golden Orb
  • Signal/Collect
  • Spark
  • Piccolo
  • HaLoop

I was not aware of all these solutions, so more to read for me.

Original title and link: What Are Some Good MapReduce Implementations for Graphs? (NoSQL database©myNoSQL)

Graph Databases Market Penetration

Emil Eifrem (Neo4j) in an interview for StartUpBeat answering a question about the competition in the graph databases space (my emphasis):

There’s a lot of movement around alternative databases today and a lot of companies in NOSQL like MongoDB, Couchbase and Cassandra. However, when we’re out in the field and talking to customers, our actual competitors are in-house custom-built solutions.

This answer made me think that:

  1. Neo4j is the by far the graph databases market leader. And I’m not sure there’s a second place (InfiniteGraph maybe?).
  2. graph databases are still either unknown in many environments or perceived as niche solutions.

If I’d be a graph database producer, I’d not worry much about my product rank in the market. But I’d definitely be concerned about the current market size and graph databases market penetration in general.

Original title and link: Graph Databases Market Penetration (NoSQL database©myNoSQL)

What Is a Graph Database?

The InfiniteGraph guys put together a page providing a short definition of what graph databases are and what advantages they bring to the table:

“A graph database… uses graph structures with nodes, edges, and properties to represent and store information.”

“Compared with relational databases, graph databases are often faster for associative data sets, and map more directly to the structure of object-oriented applications. They can scale more naturally to large data sets as they do not typically require expensive join operations. As they depend less on a rigid schema, they are more suitable to manage ad-hoc and changing data with evolving schemas.”

In terms of graph database applicability, the short answer would be: graph databases are useful for storing, traversing, and processing highly complex relationships. The expanded version:

Graph databases can help improve intelligence, predictive analytics, social network analysis, and decision and process management - which all involve highly complex relationships.

Both object databases and graph databases have been touting a lot of promises, but even if graph database scenarios abound I still think they are seen as the underdogs.

Original title and link: What Is a Graph Database? (NoSQL database©myNoSQL)


InfiniteGraph and RDF Tuples or Why Using a Specialized Solution Is the Way to Go

An excellent explanation for why it makes sense to use a specialized tool for the job:

Yes, InfiniteGraph can be used to analyze triples and RDF. But if that’s all you want to do, then you really should just use a triple store.

Our graph database trades some of the runtime flexibility (but not a lot) for well defined types and performance. RDF is fine for all the examples that have been circulated, if I just want to list all my friends or all the people I know who are married, its no big deal because the fanout of a single degree is extremely small. In fact, you can probably even just do it in mySQL for that matter. When we talk about scalability however, it’s not really about how much data we can store, but how quickly we can run across it. Storing RDF makes this effort slower. Its hard to make RDF perform, because the whole graph is self describing and therefore is computationally expensive to parse… Think of it like representing data in XML versus a defined binary format. XML is lovely to work with, basically human readable, but it is very verbose and inefficient.

The little secret here is that using a generic solution will usually work in the beginning. And if using a specialized solution implies bigger costs or longer time to market starting with what you know is just fine. But once your application grows, a specialized solution would not only provide an optimized solution, but will get you passed the initial problems that come with growth.

Original title and link: InfiniteGraph and RDF Tuples or Why Using a Specialized Solution Is the Way to Go (NoSQL database©myNoSQL)


Paper: Graph Based Statistical Analysis of Network Traffic

Published by a group from Los Alamos National Lab (Hristo Djidjev, Gary Sandine, Curtis Storlie, Scott Vander Wiel):

We propose a method for analyzing traffic data in large computer networks such as big enterprise networks or the Internet. Our approach combines graph theoretical representation of the data and graph analysis with novel statistical methods for discovering pattern and timerelated anomalies. We model the traffic as a graph and use temporal characteristics of the data in order to decompose it into subgraphs corresponding to individual sessions, whose characteristics are then analyzed using statistical methods. The goal of that analysis is to discover patterns in the network traffic data that might indicate intrusion activity or other malicious behavior.

The embedded PDF and download link after the break.

Time Lines and News Streams: Neo4j Is 377 Times Faster Than MySQL

In my use case neo4j outperformed MySQL by a factor of 377 ! That is more than two magnitudes). As known one part of my PhD thesis is to create a social newsstream application around my social networking site It is very obvious that a graph structure for social newsstreams are very natural: You go to a user. Travers to all his friends or objects of interest and then traverse one step deeper to the newly created content items. A problem with this kind of application is the sorting by Time or relvance of the content items. But before I discuss those problems I just want to present another comparission between MySQL and neo4j.

This is wrong on so many levels. Scratch that. It’s even worse than an apples-to-oranges comparison.

Original title and link: Time Lines and News Streams: Neo4j Is 377 Times Faster Than MySQL (NoSQL database©myNoSQL)