NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



marklogic: All content tagged as marklogic in NoSQL databases and polyglot persistence

Whither MarkLogic?

Curt Monash:

MarkLogic has always focused on markets where the database truly was about documents in the conventional sense — especially long text documents — aka “content”. I always thought that focus was over-narrow.

How would you concisely describe MarkLogic and its sweet spot?

Original title and link: Whither MarkLogic? (NoSQL databases © myNoSQL)


MarkLogic Needs to Harness the NoSQL Movement

MarkLogic can create a new class of license. Call it the “MarkLogic Server Lite” or something like that (I don’t care). It should be well trimmed down from the Standard and Enterprise editions. No geospatial support, no entity enrichment, no compartment security, etc. Put a database size limit of 1 GB and only allow it to run on 1 CPU. And make it completely free to use, even on commercial projects.

I hope the XML markup doesn’t account towards the 1GB limit. Otherwise, he is not the first to believe throwing some bites over the fence will suddenly lead to huge adoption.

Original title and link: MarkLogic Needs to Harness the NoSQL Movement (NoSQL databases © myNoSQL)


MarkLogic Server: Data Model, Indexing System, Operational Behaviors

A 60+ pages PDF about the MarkLogic server:

MarkLogic Server fuses together database internals, search-style indexing, and application server behaviors into a unified system. It uses XML documents as its data model, and stores the documents within a transactional repository. It indexes the words and values from each of the loaded documents, as well as the document structure. And, because of its unique Universal Index, MarkLogic doesn’t require advance knowledge of the document structure (its “schema”) nor complete adherence to a particular schema. Through its application server capabilities, it’s programmable and extensible.

Haven’t read it yet, but definitely on my list.

Original title and link: MarkLogic Server: Data Model, Indexing System, Operational Behaviors (NoSQL databases © myNoSQL)


NoSQL Frankfurt: A Quick Review of the Conference

Yesterday was the NoSQL Frankfurt conference and today we have the chance to review some of the slide decks presented.

Beyond NoSQL with MarkLogic and The Universal Index

Nuno Job (@dscape) has presented on MarkLogic — an XML server we haven’t talked too much about, its universal index, and a couple of other interesting features.

The GraphDB Landscape and sones

Achim Friedland (@ahzf) has provided a very interesting overview of the graph databases products, the goals and some scenarios for graph databases, a brief comparison of property graphs with other models (relational databases, object-oriented, semantic web/RDF, and many other interesting aspects.

Data Modeling with Cassandra Column Families

Gary Dusbabek (@gdusbabek) has covered data modeling with Cassandra (the topic I’m still finding to be one of the most complicated).

Neo4j Spatial - GIS for the rest of us

Peter Neubauer (@peterneubauer) covered another interesting topic in the data space: geographic information (GIS) in graph databases.

Even if GISers suggested this integration some time ago Neo4j announced recently support for GEO.

Cassandra vs Redis

Tim Lossen (@tlossen) slides compare Cassandra and Redis from the perspective of a Facebook game requirements. All I can say is that the conclusion is definitely interesting, but you’ll have to check the slides by yourselves.

Mastering Massive Data Volumes with Hypertable

Doug Judd — who impressed me with his fantastic Hypertable: The Ultimate Scaling Machine at the Berlin Buzzwords NoSQL conference — gave a talk on Hypertable, its architecture and performance. The presentation also mentioned two Hypertable case studies: Zvents (an analytics platform) and (spam classification)[1]:

More presentations will be added as I’m receiving them.

  1. Just recently I’ve posted about Hadoop being used for spam detection.  ()

Original title and link: NoSQL Frankfurt: A Quick Review of the Conference (NoSQL databases © myNoSQL)

MarkLogic: “We are NoSQL too”

Lately Dave Kellogg, CEO of Mark Logic Co, has been posting a series of articles in his attempt to associate the MarkLogic XML server with the NoSQL space.

We should start by looking at what MarkLogic is offering and I’ll be using as a reference ☞ Dave Kellog’s list:

  1. Unstructured data. This means not only dealing with data in odd structures (e.g., sparse and/or semi-structured data), but also handling words and all the challenges that go with them.
  2. Scaling on cheap hardware. In effect, scaling like Google, using racks of inexpensive pizza boxes instead of big, expensive computers with expensive SANs attached. This is accomplished via shared-nothing clustering.
  3. A non-relational data model. MarkLogic Server uses the XML data model.
  4. Document-orientation. MarkLogic is a document-oriented system, meaning that the fundamental modeling unit is the (XML) document and that the system includes search functionality, in the same way that a smartphone includes a GPS.
  5. Ad hoc queries. A reductionist mission statement for MarkLogic Server is “to perform database-style queries on unstructured information.” (See diagram below.)
  6. Standard interfaces. We believe in standard interfaces, in part because it’s in our self-interest to do so. Standards help de-risk the purchase of new technologies from high-growth vendors. We support a number of W3C standards XQuery, XPath, XML, xHTML, XPointer, and coming soon, XSLT.
  7. ACID transactions. We’re database guys. While we’ll let you turn off the transaction system and are in the midst of implementing replication with a consistency dial, by default we do ACID.

While doing my part of research I couldn’t find any technical references on how MarkLogic works in distributed environments[1] and also how it addresses ACID guarantees in this environment. Hopefully we will see more details about these sooner than later.

Now, the part I cannot agree with is ☞ Dave’s conclusion that:

MarkLogic provides a best-of-both-worlds option between open source NoSQL systems and traditional DBMSs.

Like open source NoSQL systems, MarkLogic provides shared-nothing clustering on inexpensive hardware, superior support for unstructured data, document-orientation, and high-performance. But like traditional databases, MarkLogic speaks a high-level query language, implements industry standards, and is commercial-grade, supported software.

I would even say that this conclusion is invalidating most (if not all) the other points in his post.

1. NoSQL systems come in many flavors

This statement is correct as the fundamental philosophy behind NoSQL systems is having the option to use the best tool for your scenario. On the other hand, at a logical level it contradicts the above conclusion.

2. NoSQL is part of a broader trend in database systems: specialization.

That is correct too. But again it is contradicting the conclusion: a system that is specialized cannot be the “best-of-both-worlds” as that would imply the existence of “silverbullet” solutions.

3. NoSQL is largely orthogonal to specialization.

Unfortunately this one is incorrect. Most (if not all) existing “core”[2] NoSQL solutions have been created to solve very specific problems. And while there are some making the mistake to confuse them for jack-of-all-trades, hopefully that is not the trend.

4. NoSQL isn’t about open source.

Indeed, NoSQL is not about open source. It is about operational costs, complexity costs, integration, extensibility, etc. None of these implies open source per se, but there must be a reason for users discovering that open source solutions have addressed these requirements better than others.

5. most open source NoSQL systems have proprietary interfaces.

That’s correct too and I’d say one of the reasons is specialization, so another contradiction with other points. On the other hand there are clear signs that each of the NoSQL projects is working on offering friendly protocols and integrate nicely with other tools

Summarizing, while I do understand why it makes a lot of sense to associate MarkLogic with the NoSQL space (and there are too many reasons for doing it that do not fit well on myNoSQL), I’d definitely appreciate if things would remain as objective as possible and be based on facts only. In the end it will be the users that will decide if they want to call MarkLogic NoSQL or not.

  1. The only references I’ve found are to database failover, hot host add/delete, fast host restart, with no other details. Putting MarkLogic on the map of distributed storage system classification would be really useful.  ()
  2. When saying “core” NoSQL systems, I’m referring to all systems that have been associated with the NoSQL since the term came up.  ()