Graph database: All content tagged as Graph database in NoSQL databases and polyglot persistence
Saturday, 27 October 2012
Next Neo4j Version Implementing HA Without ZooKeeper
The next version of Neo4j will remove the dependency on ZooKeeper for high availability setups. In a post on Neo4j blog, the team has announced the availability of the 1st milestone of Neo4j 1.9 which already contains the new implementation of Neo4j High Availability Cluster:
With Neo4j 1.9 M01, cluster members communicate directly with each other, based on an implementation of the Paxos consensus protocol for master election.
According to the updated documentation annotated with my own comments:
- Write transactions can be performed on any database instance in a cluster. (nb: writes are performed on the master first, but the cluster does the routing automatically)
- If the master fails a new master will be elected automatically. A new master is elected and started within just a few seconds and during this time no writes can take place (the writes will block or in rare cases throw an exception)
- If the master goes down any running write transaction will be rolled back and new transactions will block or fail until a new master has become available.
- The cluster automatically handles instances becoming unavailable (for example due to network issues), and also makes sure to accept them as members in the cluster when they are available again.
- Transactions are atomic, consistent and durable but eventually propagated out to other slaves. (nb: a transaction includes only the write to the master)
- Updates to slaves are eventual consistent by nature but can be configured to be pushed optimistically from master during commit. (nb: writes to slave will still not be part of the transaction)
- In case there were changes on the master that didn’t get replicated before it failed, there are chances to reach a situation where two different versions exists—if the failed master recovers. This situation is resolved by having the old master dismiss its copy of the data (nb the documentation says move away)
- Reads are highly available and the ability to handle read load scales with more database instances in the cluster.
Original title and link: Next Neo4j Version Implementing HA Without ZooKeeper (©myNoSQL)
Friday, 26 October 2012
What Is the Most Promising Graph Datastore?
Very interesting answer on Quora from professor Josep Lluis Larriba Pey.
- for very lager data size (TB): Infinitegraph, DEX
- for query speed: DEX
- for transaction support: Neo4j
Original title and link: What Is the Most Promising Graph Datastore? (©myNoSQL)
via: http://www.quora.com/Database-Systems/What-is-the-most-promising-graph-datastore
Tuesday, 25 September 2012
What's the Current State of Graph Databases?
Jim Webber1 in an interview with Srini Penchikala for InfoQ:
The graph databases are odd, because they’ve actually decided to have a much more expressive data model compared to relational databases. So I think they are an oddity compared to the other three types of NoSQL stores, which means that when a developer first comes across them there is an awful lot of head scratching—you can see this haircut was completely caused by Neo4J. So I think compared to the other NoSQL stores, the graph database community is a little bit further behind in terms of adoption and penetration because they are a bit of an odd beast when you look at them first, “What would I use graphs for, they are those things I forgot from university, with that boring old guy doing math on the whiteboard”, on the blackboard even, I’m so old we had chalk, would you believe?
It’s almost always impossible for me to disagree with Jim. Expanding a bit on the quote above, I’d speculate that a bit of head scratching before adopting a new database is good as it means you’ll not see many improper use cases.
-
Jim Webber: Chief Scientist at Neo Technology ↩
Original title and link: What’s the Current State of Graph Databases? (©myNoSQL)
via: http://www.infoq.com/interviews/jim-webber-neo4j-and-graph-database-use-cases
Monday, 16 July 2012
Rolling Upgrades in Upcoming Neo4j 1.8
Chris Gioran describes rolling upgrades, a new feature in the upcoming Neo4j 1.8
So the rolling upgrade, actually, works exactly as you’d expect an upgrade would work. If there are not breaking changes between versions, you normally begin with the slaves, powering down, copying the store, migrating configuration if needed, then bringing that server back up. The new version would take over, communicate with the rest of the cluster and you wouldn’t notice anything.
A rolling upgrade offers that with versions that have incompatible protocols. Each slave, as it is brought up, detects the version running in the cluster and gracefully falls back into a compatibility mode that doesn’t allow it to become master, but allows it to continue to execute transactions.
Another thing I’ve found interesting is that the time a master machine is upgraded is considered the confirmation of a completed upgrade and all machines are switching to the new protocol. Clever.
Original title and link: Rolling Upgrades in Upcoming Neo4j 1.8 (©myNoSQL)
via: http://architects.dzone.com/articles/regarding-rolling-upgrades
Tuesday, 29 May 2012
An Overview of Neo4j.rb 2.0
Andreas Ronge writing about using Neo4j in embedded mode with JRuby:
The advantage of the embedded Neo4j is better performance due to the direct use of the Java API. This means you can write queries in plain Ruby! Another advantage of the embedded Neo4j is that since it’s an embedded database there is one less piece of infrastructure (the database server) to install. The embedded database is running in the same process as your (Rails) application. Since JRuby has real threads there is no need to start up several instances of the database or of the Ruby runtime since JRuby can utilize all available cores on the CPU. There is actually even no need to start the database at all as it will be started automatically when needed. Notice it’s still possible to use the REST protocol or the web admin interface from an embedded Neo4j, see the neo4j-admin gem.
So which should I choose ? Well, if you can’t use JRuby or you don’t need an Active Model compliant Neo4j binding then the Neo4j Server is a good choice, otherwise I would suggest using the embedded Neo4j.rb gem (but I’m a bit biased)
As showed also by the earlier [migrating data from Oracle to MongoDB with JRuby], JRuby proves to be an interesting beast for handling data. I’m more on the side of Python, but Jython is not (yet?) as up-to-date as JRuby.
Original title and link: An Overview of Neo4j.rb 2.0 (©myNoSQL)
via: http://blog.jayway.com/2012/05/07/neo4j-rb-2-0-an-overview/
Thursday, 10 May 2012
Neo4j Data Modeling: What Question Do You Want to Answer?
Mark Needham:
Over the past few weeks I’ve been modelling ThoughtWorks project data in neo4j and I realised that the way that I’ve been doing this is by considering what question I want to answer and then building a graph to answer it.
This same principle should be applied to modeling with any NoSQL database. Thinking in terms of access patterns is one of the major differences between doing data modeling in the NoSQL space and the relational world, which is driven, at least in the first phases and theoretically, by the normalization rules.
Original title and link: Neo4j Data Modeling: What Question Do You Want to Answer? (©myNoSQL)
via: http://www.markhneedham.com/blog/2012/05/05/neo4j-what-question-do-you-want-to-answer/
Wednesday, 9 May 2012
How to Import Large Graphs to Neo4j With Spring Data
In my case, I wanted to create a simple recommendation engine (the domain doesn’t matter so much). To do that, I had to import FAST 20 million nodes of one-to-many, sparse matrix data. This became a bit more complicated (and interesting) task than originally anticipated, so it became a mini-project itself.
Bulk insert is a scenario that every database should have it covered.
Original title and link: How to Import Large Graphs to Neo4j With Spring Data (©myNoSQL)
via: http://iordanis.com/post/22677357894/import-large-graphs-to-neo4j-with-spring-data-fast
Friday, 6 April 2012
The Database Nirvana
Scroll to minute 16:55 of this video to watch Jim Webber explain the benefits of polyglot persistence and how starting (again) the winner-takes-it-all war is just sending us back at least 10 years from the database Nirvana.
We’ve just come from the place where one-size-fits-all and we don’t want to go back there. There is a huge wonderful ecosystem of stores. Pick the right one. Don’t just assume that the one you find the easiest or the one that shouts the loudest is the one you’re going to use. Pick the one that suits your data model.
It doesn’t matter what flavor of relational or NoSQL database you prefer or have experience with or if a small or large database vendor is paying your bills. You really need to get this right as otherwise we’re just going to destroy a lot of valuable options we’ve added to our toolboxes.
Original title and link: The Database Nirvana (©myNoSQL)
Tuesday, 27 March 2012
Neo4j REST API Tutorial
A detailed language agnostic intro to the Neo4j REST API:
In the above examples we have seen how nodes, relationships, and properties can be created, edited, updated, and deleted from the Neo4j HTTP terminal.
Original title and link: Neo4j REST API Tutorial (©myNoSQL)
via: http://www.hacksparrow.com/neo4j-tutorial-rest-api.html
Monday, 26 March 2012
Intro to Neo4j Cypher Query Language
Very good slidedeck from Max de Marzi introducing Neo4j’s Cypher query language. While you’ll have to go through the 50 slides yourself to get the details, I’ve extracted a couple of interesting bits:
- Cypher was created because Neo4j Java API was too verbose and Gremlin is too prescriptive
- SPARQL was designed for a different data model and doesn’t work very well with a graph database
- Cypher design decisions:
- declarative
- ASCII-art patterns (nb: when first sawing Cypher I haven’t thought of this, but it is cool)
- pattern-matching
- external DSL
- closures
- SQL familiarity (nb: as much as it’s possible with a radically different data model and processing model)
AlchemyDB: An Integrated GraphDB + RDBMS + KV Store + Document Store
I recently added a fairly feature rich Graph Database to AlchemyDB (called it LuaGraphDB) and it took roughly 10 days to prototype. I implemented the graph traversal logic in Lua (embedded in AlchemyDB) and used AlchemyDB’s RDBMS to index the data. The API for the GraphDB is modeled after the very advanced GraphDB Neo4j. Another recently added functionality in AlchemyDB, a column type that stores a Lua Table (called it LuaTable), led me to mix Lua-function-call-syntax into every part of SQL I could fit it into (effectively tacking on Document-Store functionality to AlchemyDB). Being able to call lua functions from any place in SQL and being able to call lua functions (that can call into the data-store) directly from the client, made building a GraphDB on top of AlchemyDB possible as a library, i.e. it didn’t require any new core functionality.
Two reasons for posting about it:
- relaxing contraints of the relational model can make a RDBMS partially adapt to other models (nb: this is just re-inforcing an old strategy used by)
- usually dismissed for the lack of portability in the RDBMS world, server side scripting support is an extremely powerful tool. Having processing close to data (i.e. data locality) is a well known advantage, but as shown in this post and in creating reliable queues with Redis Lua scripting it can open the doors to completely new features.
Original title and link: AlchemyDB: An Integrated GraphDB + RDBMS + KV Store + Document Store (©myNoSQL)
via: http://jaksprats.wordpress.com/2012/02/28/lightweight-oltp-data-platform/
Tuesday, 20 March 2012
The Richness of the Graph Model: The Sky Is the Limit
Marco A. Rodriguez in Exploring Wikipedia with Gremlin Graph Traversals:
There are numerous ways in which Wikipedia can be represented as a graph. The articles and the href hyperlinks between them is one way. This type of graph is known a single-relational graph because all the edges have the same meaning — a hyperlink. A more complex rendering could represent the people discussed in the articles as “people-vertices” who know other “people-vertices” and that live in particular “city-vertices” and work for various “company-vertices” — so forth and so on until what emerges is a multi-relational concept graph. For the purpose of this post, a middle ground representation is used. The vertices are Wikipedia articles and Wikipedia categories. The edges are hyperlinks between articles as well as taxonomical relations amongst the categories.
Imagine the reachness of the model you’d achieve when every piece of data and metadata would become a vertex or an edge. It’s not just the wealth of data but also the connectivity. Time would be the only missing dimension.
Original title and link: The Richness of the Graph Model: The Sky Is the Limit (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling