nosql debate: All content tagged as nosql debate in NoSQL databases and polyglot persistence
Lately Dave Kellogg, CEO of Mark Logic Co, has been posting a series of articles in his attempt to associate the MarkLogic XML server with the NoSQL space.
We should start by looking at what MarkLogic is offering and I’ll be using as a reference ☞ Dave Kellog’s list:
- Unstructured data. This means not only dealing with data in odd structures (e.g., sparse and/or semi-structured data), but also handling words and all the challenges that go with them.
- Scaling on cheap hardware. In effect, scaling like Google, using racks of inexpensive pizza boxes instead of big, expensive computers with expensive SANs attached. This is accomplished via shared-nothing clustering.
- A non-relational data model. MarkLogic Server uses the XML data model.
- Document-orientation. MarkLogic is a document-oriented system, meaning that the fundamental modeling unit is the (XML) document and that the system includes search functionality, in the same way that a smartphone includes a GPS.
- Ad hoc queries. A reductionist mission statement for MarkLogic Server is “to perform database-style queries on unstructured information.” (See diagram below.)
- Standard interfaces. We believe in standard interfaces, in part because it’s in our self-interest to do so. Standards help de-risk the purchase of new technologies from high-growth vendors. We support a number of W3C standards XQuery, XPath, XML, xHTML, XPointer, and coming soon, XSLT.
- ACID transactions. We’re database guys. While we’ll let you turn off the transaction system and are in the midst of implementing replication with a consistency dial, by default we do ACID.
While doing my part of research I couldn’t find any technical references on how MarkLogic works in distributed environments
 and also how it addresses ACID guarantees in this environment. Hopefully we will see more details about these sooner than later.
Now, the part I cannot agree with is ☞ Dave’s conclusion that:
MarkLogic provides a best-of-both-worlds option between open source NoSQL systems and traditional DBMSs.
Like open source NoSQL systems, MarkLogic provides shared-nothing clustering on inexpensive hardware, superior support for unstructured data, document-orientation, and high-performance. But like traditional databases, MarkLogic speaks a high-level query language, implements industry standards, and is commercial-grade, supported software.
I would even say that this conclusion is invalidating most (if not all) the other points in his post.
1. NoSQL systems come in many flavors
This statement is correct as the fundamental philosophy behind NoSQL systems is having the option to use the best tool for your scenario. On the other hand, at a logical level it contradicts the above conclusion.
2. NoSQL is part of a broader trend in database systems: specialization.
That is correct too. But again it is contradicting the conclusion: a system that is specialized cannot be the “best-of-both-worlds” as that would imply the existence of “silverbullet” solutions.
3. NoSQL is largely orthogonal to specialization.
Unfortunately this one is incorrect. Most (if not all) existing “core”
 NoSQL solutions have been created to solve very specific problems. And while there are some making the mistake to confuse them for jack-of-all-trades, hopefully that is not the trend.
4. NoSQL isn’t about open source.
Indeed, NoSQL is not about open source. It is about operational costs, complexity costs, integration, extensibility, etc. None of these implies open source per se, but there must be a reason for users discovering that open source solutions have addressed these requirements better than others.
5. most open source NoSQL systems have proprietary interfaces.
That’s correct too and I’d say one of the reasons is specialization, so another contradiction with other points. On the other hand there are clear signs that each of the NoSQL projects is working on offering friendly protocols and integrate nicely with other tools
Summarizing, while I do understand why it makes a lot of sense to associate MarkLogic with the NoSQL space (and there are too many reasons for doing it that do not fit well on myNoSQL), I’d definitely appreciate if things would remain as objective as possible and be based on facts only. In the end it will be the users that will decide if they want to call MarkLogic NoSQL or not.
- The only references I’ve found are to database failover, hot host add/delete, fast host restart, with no other details. Putting MarkLogic on the map of distributed storage system classification would be really useful. (↩)
- When saying “core” NoSQL systems, I’m referring to all systems that have been associated with the NoSQL since the term came up. (↩)
Couple of days ago I was posting about pros and cons of working on a (new) common query language for document databases. On the other hand, Hans Marggraff has generalized this question when ☞ writing:
NoSQL databases lack a common query language, that can provide the basis for a vendor independent tool ecosystem.
I should probably confess that over a year ago, I was asking for the same things when publishing the alternative data storage status quo.
Meanwhile I have understood that there are probably better ways to deal with the NoSQL custom query space:
- avoiding as much as possible running reports on live servers and using specialized/dedicated solutions for it (Tekpub is using both MongoDB and MySQL to deal with this normal scenario and they feel very strong about this separation)
- high level languages or tools can be built to work with your reporting and datawarehouse. And I’m referring here to Hadoop, Pig and Cascalog. Just to get an idea of what I mean check these awesome presentations on Hadoop, Pig and Cascalog from a Hadoop meet-up showcasing their usage at Twitter, BackType, and others.
Somehow as a confirmation to these approaches, Quest Software has launched yesterday Toad for Cloud a tool that supports querying data over different NoSQL solutions by providing an indirection layer that interfaces with native NoSQL querying capabilities. You can see more about this tool in the videos posted on their website.
So, I’d say there’s no need for a common (artificial) NoSQL query language. We are already seeing tools dealing with the different APIs and I’m pretty sure more will come.
The proper representation of life is not tabular, but associative. The structure of life is not relational, but hierarchical. Relation is a poor term that falls far short of capturing dynamic connections. […] Shoehorning life science into relational databases is a very lossy process.
I wholeheartedly agree! But from this perspective it looks like graph databases are the closest to model real life.