So, here is my humble request to the NoSQL techies: For each of your systems, please send me or point me to detailed technical information on each of the important aspects of your system. This should be documentation in the form of papers or presentations, and not pointers to source code comments and such! If some significant aspects of a system aren’t documented reasonably, I am urging the appropriate people to produce such documentation. Of course, for legal reasons, you should NOT send me any confidential or proprietary information.
Here is my offer in return for the above: Once I get hold of such documentation, I am willing to maintain a page for each significant NoSQL system where I will consolidate all the information on that system. Once I get hold of all that information, I will be able to do the comparisons between systems and make suggestions for improvements, etc. for each of the systems. I am planning a tutorial on NoSQL systems and it would be in the best interest of the techies of the different systems to get their systems featured in such a tutorial by providing accurate and complete information on their systems.
In the over 2 and 1/2 years since writing on this NoSQL blog I’ve seen numerous similar attempts. So far the closest to what one would call success are Stefan Edlich’s nosql-databases.org unstructured but very wide attempt to catalogue NoSQL databases and this blog which is continuously covering various aspects of NoSQL databases. My attempt to create a 5-dimensional characterization of NoSQL databases remains incomplete after 1 and 1/2 years since its debut. But I really hope Mohan will pull this out as everyone would benefit from having better information organized in an accessible public format.
These aside, I think his post brings up a couple of interesting remarks that I’d like to comment on:
- The origin of most of the NoSQL databases is not in research labs or academic world, but rather out there in the field. Most of them have been created by people that have run into problems and attempting to solve them led to trying out different approaches.
- Most of the NoSQL databases are either open source community driven or backed by small startups. Some of these startups do benefit of funding, but oftentimes that represents a fraction of what other trendy sectors are getting. As an example, Cloudera has raised $76mil in its 3 1/2 years of existence. Compare that with Color’s $40mil.
- Most of these systems are created and follow a roadmap rooted in pragmatism and practicality. They are need-based systems. If you’ve worked on an open source project or in a startup you know exactly what I mean. Features are prioritized and implemented based on the current interests of the main stakeholders which is basically the product current users.
These being said, one should note that:
- Most of the open source NoSQL database have excellent documentation (at least based on open source projects’ standard). Just take a look at Apache HBase Reference Guide or Redis’s documentation.
- There are many books covering NoSQL databases. While I don’t have all of the NoSQL books (or even read cover to cover all those that I have), many of them discuss these solutions in very detail1.
If you’d been following this blog, you’d have noticed that developers involved with NoSQL databases spend a lot of their time documenting them in great detail.
Let me give you just a couple of examples: Lars George’s rare but heavily technical posts (HBase and Data Locality, Hadoop and HBase: Configuring the Number of Server Side Threads (Xceivers), HBase and Bloom Filters) or Salvatore Sanfilipo’s posts about Redis (Redis Persistence Demystified, Redis Cluster Explained, Redis Guide: What Each Redis Data Type Should Be Used For, Redis diskstore and B-trees).
Indeed these are not academic papers, but they are definitely providing an in-depth perspective of the nuts and bolts of NoSQL databases. And such materials are not coming only from the people developing NoSQL databases, but also from those running them in production.
To date, I’ve published almost 3000 posts on this blog and besides my own contributions, a large number of these posts link to articles diving into the details of the various forms of NoSQL solutions.
Even if most of the developers working on NoSQL solutions are busy implementing and running them in production, sometimes they even find the time to publish academic papers and participate at related events.
I wish I could, but I don’t think I’ve even captured a small fraction of what these guys have published: LinkedIn NoSQL Paper: Serving Large-Scale Batch Computed Data With Project Voldemort, Paper: Apache Hadoop Goes Realtime at Facebook, Riak Bitcask Explained.
Many companies backing NoSQL solutions spend a tremendous amount of time and effort to continuously improve the documentation available. Take a look at DataStax’s documentation for Cassandra, Basho’s documentation for Riak, 10gen’s MongoDB documentation, and I could go on and on for a while.
Last, but not least, check the job boards of these companies: almost each of them is looking for technical writers and evangelists. Obviously that’s because they want to bring more clarity to their products and make things easier for their users.
Bottom line, I think that the NoSQL space is doing quite well in documenting their technical decisions, trade-offs, recommended use cases. I’d actually say that most of the time it’s easier for me to get details about almost any NoSQL database then to figure out some details of a traditional database vendor solution—try to learn how IBM DB2 is implementing compression, or how Teradata is doing hybrid row and column storage. But maybe all this is because I’ve spent so much time in this space.
Anyways, I applaud and wish C. Mohan’s initiative will be successful. And because it is always my intention to help the NoSQL community, I’m ready to offer him both my help and support.
Sometimes I wish I’d get a copy of every NoSQL book published. ↩
Original title and link: My Humble Request to the NoSQL Techies ( ©myNoSQL)