NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MarkLogic: “We are NoSQL too”

Lately Dave Kellogg, CEO of Mark Logic Co, has been posting a series of articles in his attempt to associate the MarkLogic XML server with the NoSQL space.

We should start by looking at what MarkLogic is offering and I’ll be using as a reference ☞ Dave Kellog’s list:

  1. Unstructured data. This means not only dealing with data in odd structures (e.g., sparse and/or semi-structured data), but also handling words and all the challenges that go with them.
  2. Scaling on cheap hardware. In effect, scaling like Google, using racks of inexpensive pizza boxes instead of big, expensive computers with expensive SANs attached. This is accomplished via shared-nothing clustering.
  3. A non-relational data model. MarkLogic Server uses the XML data model.
  4. Document-orientation. MarkLogic is a document-oriented system, meaning that the fundamental modeling unit is the (XML) document and that the system includes search functionality, in the same way that a smartphone includes a GPS.
  5. Ad hoc queries. A reductionist mission statement for MarkLogic Server is “to perform database-style queries on unstructured information.” (See diagram below.)
  6. Standard interfaces. We believe in standard interfaces, in part because it’s in our self-interest to do so. Standards help de-risk the purchase of new technologies from high-growth vendors. We support a number of W3C standards XQuery, XPath, XML, xHTML, XPointer, and coming soon, XSLT.
  7. ACID transactions. We’re database guys. While we’ll let you turn off the transaction system and are in the midst of implementing replication with a consistency dial, by default we do ACID.

While doing my part of research I couldn’t find any technical references on how MarkLogic works in distributed environments[1] and also how it addresses ACID guarantees in this environment. Hopefully we will see more details about these sooner than later.

Now, the part I cannot agree with is ☞ Dave’s conclusion that:

MarkLogic provides a best-of-both-worlds option between open source NoSQL systems and traditional DBMSs.

Like open source NoSQL systems, MarkLogic provides shared-nothing clustering on inexpensive hardware, superior support for unstructured data, document-orientation, and high-performance. But like traditional databases, MarkLogic speaks a high-level query language, implements industry standards, and is commercial-grade, supported software.

I would even say that this conclusion is invalidating most (if not all) the other points in his post.

1. NoSQL systems come in many flavors

This statement is correct as the fundamental philosophy behind NoSQL systems is having the option to use the best tool for your scenario. On the other hand, at a logical level it contradicts the above conclusion.

2. NoSQL is part of a broader trend in database systems: specialization.

That is correct too. But again it is contradicting the conclusion: a system that is specialized cannot be the “best-of-both-worlds” as that would imply the existence of “silverbullet” solutions.

3. NoSQL is largely orthogonal to specialization.

Unfortunately this one is incorrect. Most (if not all) existing “core”[2] NoSQL solutions have been created to solve very specific problems. And while there are some making the mistake to confuse them for jack-of-all-trades, hopefully that is not the trend.

4. NoSQL isn’t about open source.

Indeed, NoSQL is not about open source. It is about operational costs, complexity costs, integration, extensibility, etc. None of these implies open source per se, but there must be a reason for users discovering that open source solutions have addressed these requirements better than others.

5. most open source NoSQL systems have proprietary interfaces.

That’s correct too and I’d say one of the reasons is specialization, so another contradiction with other points. On the other hand there are clear signs that each of the NoSQL projects is working on offering friendly protocols and integrate nicely with other tools

Summarizing, while I do understand why it makes a lot of sense to associate MarkLogic with the NoSQL space (and there are too many reasons for doing it that do not fit well on myNoSQL), I’d definitely appreciate if things would remain as objective as possible and be based on facts only. In the end it will be the users that will decide if they want to call MarkLogic NoSQL or not.

  1. The only references I’ve found are to database failover, hot host add/delete, fast host restart, with no other details. Putting MarkLogic on the map of distributed storage system classification would be really useful.  ()
  2. When saying “core” NoSQL systems, I’m referring to all systems that have been associated with the NoSQL since the term came up.  ()