NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Marklogic: All content tagged as Marklogic in NoSQL databases and polyglot persistence

Enterprise-class NoSQL

What is distinctive about an enterprise-class NoSQL database is its support for additional enterprise-scale application requirements, namely: ACID (atomic, consistent, isolated, and durable) transactions, government-grade security and elasticity, as well as automatic failover.

What is distinctive about an enterprise-class NoSQL database is what my company is selling.

If that would be true, I doubt we would have no any other databases around considering MarkLogic’ age and perfect fit.

Snarky comments aside, the enterprise requirements are so complicated, numerous, political and sometime non-technical, that I don’t think anyone would ever be able to come up with a definition or (even if extremely long) checklist of what’s enterprise-grade.

Original title and link: Enterprise-class NoSQL (NoSQL database©myNoSQL)


How to interpret NoSQL funding rounds

Adam Fowler (Marklogic):

Looking solely at money raised it it is tempting to conclude that MongoDB is the most successful NoSQL vendor out there… It simply isn’t though. It’s a services company mostly, and one that doesn’t make much in software license. They’re simply louder than the rest.

Sounds bitter. Very bitter.

Original title and link: How to interpret NoSQL funding rounds (NoSQL database©myNoSQL)


Blame it on the database

The story of a famous failure:

Another sore point was the Medicare agency’s decision to use database software, from a company called MarkLogic, that managed the data differently from systems by companies like IBM, Microsoft and Oracle. CGI officials argued that it would slow work because it was too unfamiliar. Government officials disagreed, and its configuration remains a serious problem.


“We have not identified any inefficient and defective code,” a CGI executive responded in an email to federal project managers, pointing again to database technology that the Medicare agency had ordered it to use as the culprit, at least in part.

I’m not going to defend Marklogic. But this sounds so much as the archetype of a failure story:

  1. start by blaming the other contractors
  2. find the newest or less known technology used in the project
  3. point all fingers to it

Long time ago I’ve been in a similar project. Different country, different agencies, different contractors, but exactly the same story. It was in the early days of my career. But what I’ve learned at that time stuck with me and even if today it may sound like a truism, it’s still one of the big lessons: It’s not the technology. It’s the people. Always. And the money.

Original title and link: Blame it on the database (NoSQL database©myNoSQL)


MarkLogic Raises $25M to Keep Up Enterprise NoSQL Pitch

Jordan Novet for GigaOm annoucing another round of funding raised by MarkLogic:

On Wednesday, MarkLogic’s success was validated again, as the company announced a $25 million round of venture funding, bringing the total it has raised to $71.2 million. Sequoia Capital and Tenaya Capital led the round; CEO Gary Bloom and other MarkLogic executives also contributed.

✚ In 2010, MarkLogic made the first steps to join the NoSQL trends. Not very vigurous steps, but not shy either. Dave Kellogg (CEO of MarkLogic): We are NoSQL too

✚ As of this year, MarkLogic tries to position its product as NoSQL for enterprise. Price-wise, I have to agree.

✚ MarkLogic also tries a amore aggressive positioning in the NoSQL space: MarkLogic’s New (Aggressive) Voice

Original title and link: MarkLogic Raises $25M to Keep Up Enterprise NoSQL Pitch (NoSQL database©myNoSQL)


MarkLogic’s New (Aggressive) Voice

MarkLogic has been around for a while. I don’t have any details about how their business is doing, but attention wise, I’m pretty sure they’d love to get a slice of what younger NoSQL database get.

In the last few weeks, I got the impression there’s a change of voice in MarkLogic’s message.

The first sign: “Playtime with MongoDB is Over. Upgrade to MarkLogic Enterprise NoSQL.“:

When playtime is over and it is time to seriously support the needs of your enterprise, the clear choice is to upgrade to MarkLogic Enterprise NoSQL. (We even have a Mongo2MarkLogic converter tool that speeds the import of data from MongoDB into MarkLogic so you can start using MarkLogic’s integrated search and enterprise features faster.)

To be clear, the post calls our Cassandra, MongoDB, Riak and HBase.

Second sign: “Get Your Facts Straight: We’ve Had Enterprise-Grade Security Longer“:

DataStax put out a press release today claiming that with their new release of DataStax Enterprise 3 they were the “World’s First NoSQL Big Data Platform With Comprehensive Enterprise-Grade Security.”

We’d like to set the record straight. MarkLogic has had Enterprise-grade security for well over 10 years. So, while I won’t make the claim that we were first — I certainly won’t accept that DataStax was first either.

Both these posts are bold. I like that. What I don’t like though is the aggressive and dismissive tone. That might bring you attention, but not the type that comes with new users.

Original title and link: MarkLogic’s New (Aggressive) Voice (NoSQL database©myNoSQL)

MarkLogic Querying for SQL People

Inspired by the MongoDB MapReduce translated to SQL and Neo4j Cypher Querying for SQL People, MarkLogic’s Jason Hunter and Eric Bloch put together a page mapping SQL terms and queries to MarkLogix terms and XQuery queries respectively.

Here is how SQL statements translate to MarkLogic XQuery expressions:

MarkLogic, LexisNexis, XML, and Search

The lessons to be learned from the story about LexisNexis and MarkLogic—GigaOm and PR announcement—are quite simple:

  • Put XML into an XML database, objects into an Object Database, JSON into a document database, relational data into a relational database and you’ll get the best results
  • the better the data store understands the structure of your data, the better search results should be

Original title and link: MarkLogic, LexisNexis, XML, and Search (NoSQL database©myNoSQL)

MarkLogic 5: Confidence at Scale, Enterprise Big Data, Hadoop Connector, Express Edition

I rarely write about MarkLogic[1], but the amount of information that hit me about the newly released MarkLogic 5 made me curious. Below are quotes and commentary about MarkLogic 5, the new MarkLogic Express, and MarkLogic and Hadoop integration.

MarkLogic is a next generation database for Big Data and unstructured information. MarkLogic empowers organizations to make high stakes decisions on Big Data in real time.

So far I thought MarkLogic is an XML database with powerful search capabilities. This new message makes it sound like MarkLogic is a Big Data Analytics or BI tool, which I don’t think would be the most accurate description.

MarkLogic Confidence at Scale

There are a couple of new feature falling into this category as presented in the press release:

  • Database Replication – protect your mission-critical information from site-wide disasters and reduce the cost of downtime

    Chris Kanaracus:

    MarkLogic 5 features the ability to keep a “hot copy” of the database in another data center for quick failover in the event of a disaster, as well as a journal-archiving function that allows a database to be restored to a particular point in time.

  • Point-in-Time Recovery – recover from backups to a specific point-in-time then roll forward using the transaction log to a specific point-in-time, minimizing the window for lost data between the occurrence of a disaster and the time the last backup was taken.

Not sure how it works and what are the requirements for getting it to work, but point-in-time recovery sounds like a very interesting feature.

Enterprise Big Data

New features falling into this category as per the press release and coverage:

  • Simplified Monitoring — new monitoring and management features enable organizations to see system status at a glance with real-time charts of metrics such as I/O rates and loads, request activity, and disk usage.
  • Monitoring Plug-Ins — integration with HP Operations Manager and Nagios
  • Tiered Storage – expand Big Data performance by implementing a solid state disk (SSD) tier between memory and disk

This last feature is one that prepares MarkLogic for the future by allowing it to work smartly with different storages. Ron Avnur (CTO, MarkLogic) interviewed by Chris Kanaracus:

We realized people have rotational drives and network-attached storage, and are starting to play more seriously with solid-state. These have different performance profiles.

System administrators will tell MarkLogic where and what the options for storage are, and the system will “do all the optimization.” In this way, more frequently used data can be kept in flash and older or less frequently accessed information held elsewhere.

I’m not aware of other solutions being able to play smart with heterogeneous storage deployments.

MarkLogic Connector for Hadoop

Press release:

The MarkLogic Connector for Hadoop powers large-scale batch processing for Big Data Analytics on the structured, semi-structured, and unstructured data residing inside MarkLogic. Using MarkLogic for real time analytics with Hadoop for batch processing brings the best of Big Data to companies that need real time, secure, enterprise applications that are cost effective with high performance. With simple drop-in installation, organizations can run MapReduce on data inside MarkLogic and take advantage of Hadoop’s development and management tools, all while being able to leverage MarkLogic’s indexes and distributed architecture for performance. This combination results in enhanced search, analytics, and delivery in MarkLogic, and enables organizations to progressively enhance data without having to remove it from the database.

Jason Hunter (deputy CTO, MarkLogic):

MarkLogic sees Hadoop as being able to support MarkLogic for various uses. For example, an intelligence-gathering organization could collect data that is into hundreds of petabytes, not understanding what exactly is there, but then decide to investigate a particular topic in-depth. In such a scenario, users would want to use MarkLogic for interaction with this content, asking questions and getting answers in sub-second time, and then asking other questions and exploring the data for insights. However, Hunter explains, because the data is so large it would probably not be cost-effective to load hundreds of petabytes of data into MarkLogic if they don’t have to, and so they can load the data into Hadoop and run a Hadoop job to select the portion of the content that it makes sense to do real-time analytics against and load that into MarkLogic for interactive queries. “So you go from hundreds of petabytes down to one petabyte, or half a petabyte, do bulk load and do interactive queries against it.”

MarkLogic Express

Press release:

MarkLogic Express, a new MarkLogic 5 license that allows students and developers to download and take MarkLogic into production immediately.

MarkLogic Express includes geospatial capabilities, alerting, and can be used in production environments. That means a developer can take a MarkLogic implementation that leverages a 2 CPU node and up to 40 GB of data live.

Josette Rigsby points out some more limitations of the Express version:

  • Can’t combine with another licensed install of MarkLogic
  • Can’t be used for work on behalf of the U.S. Federal Government
  • No clustering
  • Can’t run multiple production copies of Express for the same application
  • Cannot be used by development teams — note: this point is very confusing.

It looks like MarkLogic is ackowledging the power developers represent in the current organizations and they decided to offer access to the product. While I don’t think the current restrictions would allow someone to go in production with the MarkLogic Express version, I still believe is better than nothing. I’ve also read that students and researchers could get access to a less restrictive version—something that’s easy to appreciate.

MarkLogic 5 includes also some feature that are probably appealing to their users (rich media support, document filters, query console, REST-based API, distributed transaction support, geo-support).

I’m leaving you with Curt Monash’s comments:

MarkLogic seems to have settled on a positioning that, although distressingly buzzword-heavy, is at least partly based upon reality. The real part includes:

  • MarkLogic is a serious, enterprise-class DBMS (see for example Slide 12 of the MarkLogic deck) …
  • … which has been optimized from the getgo for poly-structured data.
  • MarkLogic can and does scale out to handle large amounts of data.
  • MarkLogic is a general-purpose DBMS, suitable for both short-request and analytic tasks.
  • MarkLogic is particularly well suited for analyses with long chains of “progressive enhancement” (MarkLogic’s favorite term when talking about derived data).
  • MarkLogic often plays the role of a content assembler and/or search engine, and the people who MarkLogic in those ways are commonly doing things that can be described as research and analysis.

and a short video of MarkLogic CTO, Ron Avnur summarizing the release:

  1. In case it wasn’t obvious I don’t like XML as a storage format, nor did I like XML databases.  

Original title and link: MarkLogic 5: Confidence at Scale, Enterprise Big Data, Hadoop Connector, Express Edition (NoSQL database©myNoSQL)

1 Week, 1 Project, 3 Databases: MarkLogic, CouchDB, MongoDB

That’s the gist of this application. It is non-trivial and had a very rich design and interaction. My team had an excellent QA, excellent front end dev, and me who was the only one who knew MarkLogic. The other team chose to implement theirs using a Javascript front-end architecture communicating with CouchDB (later Java with MongoDB) on the backend. The two teams involved very skilled people. If these two technology approaches were going to go head-to-head, these were the people to do it.

Judging by the brief description of the requirements, there was nothing about this application that either CouchDB or MongoDB would not be able to handle. So I assume that there was some learning curve involved in the team that went first with CouchDB and then moved to MongoDB. Not to mention changing the technology for such a short term project.

Original title and link: 1 Week, 1 Project, 3 Databases: MarkLogic, CouchDB, MongoDB (NoSQL database©myNoSQL)


Full Text Search: What to Use?

A problem everyone using a NoSQL databases faces (nb: actually I think this applies to most storage engines that don’t support full text indexing):

The problem now is: what to use? Currently I’m toying with 3 options:

  1. Use Sphinx Search; it’s pretty powerful, pretty damn fast, but requires me to feed it data through XML, but only when the indexer runs. Basically it’s quite hard to get real-time indexes going, and the delta updates are something I’d rather not mess with. 
  2. Use Solr; I’d go for this if it wasn’t for the fact it’s Java and requires Tomcat to work. Our entire application infrastructure is basically MongoDB and Perl, and I don’t want to go and set up a Tomcat instance just for Solr; on top of which I have a pathologically deep hatred for Java, but that aside…
  3. Roll my own. Full text search the way we need it doesn’t actually require things like stemming or fancy analysis of things. What it does need is the ability to search a schema-less database… Solr and Sphinx both suffer from the fact you need to tell them what to index, and even then you run into the fact that it’ll need a double pass. First pass is getting the search results, and the second pass entails the checking to see whether the user doing the search can actually see the document. 

Couple of thoughts:

  1. there are a couple of solutions out there, both relational and NoSQL databases, that support different degrees of full text indexing (e.g. Riak Search, MarkLogic)
  2. even if your database supports some form of full text search, the implementation might not be complete/optimal.
  3. initially it may sounds like building a reverse index is the best solution. Twitter’s story of migrating from their own reverse indexes in MySQL to a Lucene based solution should change your mind.
  4. some NoSQL databases provide good mechanisms for enabling full text indexing. Riak has post commit hooks, CouchDB has a _changes feed.

Original title and link: Full Text Search: What to Use? (NoSQL database©myNoSQL)


ThriftDB: The Amazon Web Services of Search

ThriftDB presented today at TechCrunch Disrupt:

Technically speaking, ThriftDB is a flexible key-value datastore with search built in that has the flexibility, scalability, and performance of a NoSQL datastore with the capabilities of full-text search. Essentially, what this means is that, by combining the datastore and the search engine, ThriftDB is offering a service that makes it easy for developers to build fast, horizontally-scalable applications with integrated search.

The website says ThriftDB is a document database built on top of Thrift with full-text search support. I’m not really sure about the Amazon Web Services for Search, but it sounds like it would go against Marklogic, ElasticSearch, Solr, and so on.

Original title and link: ThriftDB: The Amazon Web Services of Search (NoSQL databases © myNoSQL)