neo4j: All content tagged as neo4j in NoSQL databases and polyglot persistence
A nice intro to Gremlin, the Groovy-based graph traversal language supporting Neo4j, OrientDB, DEX, RDF Sail, TinkerGraph, and ReXster:
Original title and link: An Intro to Gremlin the Graph Traversal Language ( ©myNoSQL)
Mark Pollack (VMWare) and Emil Eifrem (Neo Technology) answering the why and how to use Spring Data and Neo4j.
Lorenzo Alberton with an overview of the NoSQL landscape:
NoSQL databases get a lot of press coverage, but there seems to be a lot of confusion surrounding them, as in which situations they work better than a Relational Database, and how to choose one over another. This talk will give an overview of the NoSQL landscape and a classification for the different architectural categories, clarifying the base concepts and the terminology, and will provide a comparison of the features, the strengths and the drawbacks of the most popular projects (CouchDB, MongoDB, Riak, Redis, Membase, Neo4j, Cassandra, HBase, Hypertable).
structr is a free, open-source CMS under the GPLv3, written in Java, based on the fantastic NoSQL graph database Neo4j.
By design, structr is modular, distributed and easy to use.
structr is not yet stable, so please be patient and look out for bugs and minor (or even major) pitfalls.
If my memory serves me right, Neo4j started as a library used internally for building content management systems.
Last week, Neo Technology has released the 1.3 version of their graph database Neo4j. The technical aspects of the release have been covered in this blog post. Briefly:
- support for large data sets and optimizations at the storage level
- improved web admin tool
- API cleanup
But the most exciting aspect of Neo4j 1.3 is the availability of a GPL version of the graph database. Emil Eifrem has covered it here:
Today marks a new major milestone for Neo4j: we’re making the core graph database - Neo4j Community - available under the same proven open source license as MySQL, the GNU General Public License (GPL).
That means that in every scenario where you can use MySQL for free, you can now also use Neo4j Community for free.
I had the chance to talk to Emil and he has been kind enough to answer my questions.
Alex: It took Neo Technology almost 10 years to release Neo4j 1.0. Since then things seem to have moved faster and faster. What changed leading to this fast paced release cycles?
Emil: The main reason is that our community has just reached a critical mass. This means that the feedback loop is faster, feature requests are more frequent, bug fixes and patches are better. It’s a faster and more virtuous cycle. On top of that, our customer traction the past year has allowed us to grow the full time in-house development team.
Alex: How would you summarize the release of Neo4j 1.3?
Emil: By far the most important aspect of this release is the license change to the GPL for Neo4j Community. Secondly, I’d put the support for really large stores (100+ billion of primitives). And finally, I’d love to give a shout out to the new interactive graph visualization in the web UI.
Alex: 3 products and 3 licenses. Moreover Neo4j Community edition comes with a GPLv3 license. As you know I’ve always said that graph databases market is missing a more open license. So what made you change your mind about the licensing model?
Emil: The GPL is the best license for getting Neo4j in the hands of developers worldwide. It’s a proven model to get databases in the hands of developers while protecting an OEM revenue stream, so we figured why reinvent the wheel? The world deserves a graph database under the GPL.
Alex: Could you please clarify a bit the differences between the 3 products and their licensing models?
Emil: Sure, Neo4j Community is what most people will use. It’s a fully functional, robust and mature graph database. It’s available under the GPL like MySQL, which means that it can be used for free in all “end user” scenarios (for example to back a webapp). For OEM scenarios (i.e. it’s embedded in a product that ships to end users) then the enclosing product must be open source.
Neo4j Advanced adds monitoring and management and couples that with commercial support. It’s available under the AGPL or a commercial license.
Neo4j Enterprise adds high availability, i.e. the ability to automatically and transparently replicate the graph across many instances, and enterprise-grade 24/7 commercial support. It’s available under the AGPL or a commercial license.
Alex: In your post you are saying that “the graph database opportunity is at least as big as the MySQL opportunity”. Could you please expand on this?
Emil: Absolutely. First off, information is exploding in both volume and complexity and in many cases relational databases can’t keep up. For example, a lot of big installations have massive problems with low-latency queries due to joins.
Secondly, business requirements are changing. For example, we have high requirements on the freshness of information (“realtime”) where a retail store may want to get a coupon recommendation while the customer is still in the store, not 24 hour later from the big corporate data warehouse.
Some of the largest web properties in the world were hit early by these two forces, and this catalyzed NoSQL. Now ask yourself this: of these two trends (information volume / complexity and realtime business requirements), in which direction is the world moving? I think the answer is clear and over time, most database deployments in the world will face requirements similar to the high-end web properties of today. In order to deliver business value, IT departments must then be equally committed to SQL and NoSQL.
I think of the current NoSQL landscape graph databases have the opportunity to solve the most problems, for most developers, in most situations. A graph database is incredibly horizontally applicable and it’s useful across a wide range of problem spaces. In a world where most applications make use of both SQL and NOSQL, graph databases have the opportunity to be as frequently used as MySQL is today.
That’s why I said that the graph database opportunity is at least as big as the MySQL opportunity.
Alex: Could you enumerate some not so common use cases for Neo4j?
Emil: No! If they’re not so common I probably don’t know them. But here are three relatively unknown use cases for graph databases:
- Cloud Management: Neo4j is used today to back management and operations on some of the largest private cloud deployments in the world.
- Network Management: In the telecom and datacom world, management of resources in networks has long been a huge problem. It lends itself incredibly well to graph modeling.
- Master Data Management (MDM): This is a very enterprise-y use case, but relevant for all big companies in the world. MDM stores the master data for a big company and that data is usually very complex and dynamic and gives huge join-problems if you put it in a relational database. That kind of dataset is a great fit for a graph database.
Alex: I confess that I was expecting to see a more open license available in the graph databases market. So I’m happy to see this happening. Also I’m convinced that it is a very smart move for both the future of graph databases and your company. Thanks a lot Emil.
Original title and link: Emil Eifrem about Neo4j 1.3 and the Neo4j GPL Community Edition (NoSQL databases © myNoSQL)
The new architecture of Evident ClearStone APM is using both Cassandra and Neo4j:
Cassandra is implemented as a time-series data store for storing all the real-time data and historical data. Our implementation uses Apache Cassandra 0.7 with the Hector client APIs for Cassandra. With Cassandra 0.7, we can dynamically create and evolve column families for storing all the performance data. The performance data is normalized by metric. We have also partitioned our column families based on the granularity of the data sets.
Neo4j is implemented as an inventory database used for maintaining all the managed resources (i.e. processes, hosts, clusters, etc.) of the application environment. It is used to store current state of all the resources, relationships among the resources, and correlated events to these resources. Anytime there are events associated with a resource, we keep a timeline of such events married to a snapshot of the associated resource(s) in the inventory at the time of the event occurrence. We felt the use of a graph database like Neo4j was ideal for storing metadata for the resources and mapping relationships and correlated events.
Bill Nigh writes about some of the challenges of moving from an RDBMS to NoSQL technologies:
- lack of queries and triggers for data changes
- too much schema-less freedom in Neo4j graph database or the challenge of doing data modeling
So Evident is not only offering tools for monitoring NoSQL solutions, but they are using them internally for their product. Practice what you preach.
Original title and link: Cassandra and Neo4j Used by Evident ClearStone APM (NoSQL databases © myNoSQL)
Jim Webber has published a series of posts — here, here, and here — discussing generic graph sharding solutions, the route Neo4j is taking for addressing this problem, and, in the last post, a simple strategy — suggested by Mark Harwood — for deciding a graph database scaling approach:
Dataset size: Many tens of Gigabytes
Strategy: Fill a single machine with RAM
Dataset size: Many hundreds of Gigabytes
Dataset size: Terabytes and above
Strategy: Domain-specific sharding
The cache sharding approach — described here — suggests replacing the problem of graph sharding with “the simpler problem of consistent routing”. But I’m not sure how this solution works effectively:
- if there is a way to provide consistent routing isn’t that equivalent with having an solution for sharding the graph?
- considering graph databases are chatty — in the sense that most of the time there is a complex traversal happening — how smart should the client and the router be to work effectively and efficiently?
For now I feel that being able to solve the consistent routing in a graph implies having a strategy for sharding a graph. The reverse applies too: having a sharding strategy would provide a routing solution. Thus scaling graph databases remains a subject open for research.
Original title and link: Scaling Graph Databases: Sharding and Consistent Routing (NoSQL databases © myNoSQL)