As I expected (and was quickly confirmed by a lot of people), the results in the graph database benchmark showing Neo4j being outperformed by MySQL, Vertica, VoltDB could have been much improved:
Our conclusions from this are that, like any of the complex systems we
tested, properly tuning Neo4j can be tricky and getting optimal performance
may require some experimentation with parameters. Whether a user of Neo4j
can expect to see runtimes on graphs like this measured in milliseconds or
seconds depends on workload characteristics (warm / cold cache) and whether
setup steps can be amortized across many queries or not.
Looking at the 3 improvements mentioned in the post:
- Excluding connection. I think the change in the benchmark is actually about not accounting for the initialization of the database rather than timing connections. The performance of establishing connections is still pretty important. (check Mark Callaghan‘s posts about the work at Facebook to improve MySQL’s connections performance)
- Warm cache. A benchmark should measure both empty and warm caches behavior as these are two scenarios that any application will face.
- Simpler algorithm. This one is quite tricky. While the application should definitely take the approach that fits your database, it’s also a matter of knowledge and complexity. You could also think that the more different approaches you can use the better results you can get. Or vice-versa, the more approaches are possible the more time you’ll spend understanding which one to use, instead of getting things done (think Python vs. Perl).
Original title and link: Updated conclusions about the graph database benchmark - Neo4j can perform much better ( ©myNoSQL)