NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MongoDB: All content tagged as MongoDB in NoSQL databases and polyglot persistence

5 Myths about NoSQL vs Relational Databases

Ryan Betts, the CTO of VoltDB addressing an article by MongoDB’s CEO Max Schireson that seems to have stroken a chord:

Recently Max Schireson, CEO of MongoDB, shared his thoughts on relational databases. His statements deserve a direct and frank opposing response. Let’s walk through the myths that Mr. Schireson promoted.

Compared with PostgreSQL’s Robert Haas post “Why the clock is ticking for MongoDB“, this one makes some debatable arguments — e.g. “All popular SQL systems support document types”: aside for SOA committees and MarkLogic, I’ve never heard someone enjoying XML. They aren’t innaccurate, but they’re paiting VoltDB’s space in a too bright color palette.

Original title and link: 5 Myths about NoSQL vs Relational Databases (NoSQL database©myNoSQL)


MongoDB is growing up

If Curt Monash says so…

With that caveat, the MongoDB rewrite story is something like:

  • Updating has been reworked. Most of the benefits are coming later.
  • Query optimization and execution have been reworked. Most of the benefits are coming later, except that …
  • … you can now directly filter on multiple indexes in one query; previously you could only simulate doing that by pre-building a compound index.
  • One of those future benefits is more index types, for example R-trees or inverted lists.
  • Concurrency improvements are down the road.
  • So are rewrites of the storage layer, including the introduction of compression.

Original title and link: MongoDB is growing up (NoSQL database©myNoSQL)


maxTimeMS in MongoDB 2.6

Jason McCay (MongoHQ) explains the new maxTimeMS API in MongoDB 2.6:

There are a number of scenarios where a flag like this can be helpful. For example, if you are in discovery mode and want to protect your database performance against unintended runaway operations, you could ensure all your queries include this flag.

Another scenario would be the batching of results, allowing you to define the amount of time/effort the database should spend returning results until it quits and moves on to the next request. In this situation, the cursor would continue to return results until the allotted amount of time has expired.

Original title and link: maxTimeMS in MongoDB 2.6 (NoSQL database©myNoSQL)


Why the clock is ticking for MongoDB

Robert Haas takes a comparative look at PostgreSQL and MongoDB’s features emphasized by its MongoDB CEO in an interview:

Schireson also mentions another advantage of document stores: schema flexibility. Of course, he again ignores the possible advantages, for some users, of a fixed schema, such as better validity checking. But more importantly, he ignores the fact that relational databases such as PostgreSQL have had similar capabilities since before MongoDB existed. PostgreSQL’s hstore, which provides the ability to store and index collections of key-value pairs in a fashion similar to what MongoDB provides, was first released in December of 2006, the year before MongoDB development began. True JSON capabilities were added to the PostgreSQL core as part of the 9.2 release, which went GA in September of 2012. The 9.4 release, expected later this year, will greatly expand those capabilities. In today’s era of rapid innovation, any database product whose market advantage is based on the format in which it is able to store data will not retain that advantage for very long.

It’s difficult impossible to debate or contradict the majority of facts and arguments the author is making. But in order to understand the history and future of developer tools, it’s worth emphasizing one aspect that has been almost completely ignored for way too long. — and the author mentions it just briefly.

Developers want to get things done. Fast and Easy.

For too long vendors thought that a tool that had a feature covered was enough. Even if the user had to read a book or two, hire an army of consultants, postpone the deadlines, and finally make three incantations to get it working. This strategy worked well for decades. It worked especially well in the space of databases where buying decisions where made at the top level due to the humongous costs.

MySQL became one of the most popular database because it was free and perceived to be easier than any of the alternatives. Not because it was first. Not because it was feature complete. And definitely not because it was technically superior — PostgreSQL was always technically superior, but never got the install base MySQL got.

MongoDB replays this story by the book. It’s free. It promises features that were missing or are considered complicated in the other products. And it’s perceived as the easiest to use database — a look at MongoDB’s history will reveal immediately its primary focus on ease of use: great documentation, friendly setup, fast getting started experience. For a lot of people, it really doesn’t matter anymore that there are alternative solutions that offer technically superior solutions. They’ve got their things done. Fast and Easy. Tomorrow is another day.

Original title and link: Why the clock is ticking for MongoDB (NoSQL database©myNoSQL)


NoSQL meets Bitcoin and brings down two exchanges

Most of Emin Gün Sirer’s posts end up linked here, as I usually enjoy the way he combines a real-life story with something technical, all that ending with a pitch for HyperDex.

The problem here stemmed from the broken-by-design interface and semantics offered by MongoDB. And the situation would not have been any different if we had used Cassandra or Riak. All of these first-generation NoSQL datastores were early because they are easy to build. When the datastore does not provide any tangible guarantees besides “best effort,” building it is simple. Any masters student in a top school can build an eventually consistent datastore over a weekend, and students in our courses at Cornell routinely do. What they don’t do is go from door to door in the valley, peddling the resulting code as if it could or should be deployed.

Unfortunately in this case, the jump from the real problem, which was caused only by the pure incompetence, to declaring “first-generation NoSQL databases” as being bad and pitching HyperDex’s features is both too quick and incorrect1.

  1. 1) ACID guarantees wouldn’t have solved the issue; 2) All 3 NoSQL databases mentioned, actually offer a solution for this particular scenario. 

Original title and link: NoSQL meets Bitcoin and brings down two exchanges (NoSQL database©myNoSQL)


When is MongoDB the Right Tool for the Job?

This puts me in a quandary, because my recent stint on the job market has shown that just about everybody is using MongoDB, and I’ve just never been in any situation that I have needed to use it.

I also can’t foresee any situation where there is a solid technical reason for choosing MongoDB over it’s competitors either, and the last thing I want to do is lead people astray or foist my preconceptions onto them.


Then the top comment on reddit.

Original title and link: When is MongoDB the Right Tool for the Job? (NoSQL database©myNoSQL)


A practical comparison of Map-Reduce in MongoDB and RavenDB

Ben Foster looks at MongoDB’s Map-Reduce and aggregation framework and then compares them with RavenDB’s Map-Reduce:

I thought it would be interesting to do a practical comparison of Map-Reduce in both MongoDB and RavenDB.

There are more differences than similarities — I’m not referring to the API differences, but to fundamental differences to the ways they operate.

✚ RavenDB’s author has a follow up post in which he underlines another major difference: RavenDB’s Map-Reduce operates as an index, while MongoDB’s Map-Reduce is an online operation.

Original title and link: A practical comparison of Map-Reduce in MongoDB and RavenDB (NoSQL database©myNoSQL)


4 Reasons Perfect Market chose MongoDB

A team from Perfect Market about choosing MongoDB for their Digital Publishing Suite:

There are many NoSQL products out there, why did we bet on MongoDB? There are four major reasons: great performance, great features, ease of use and great support. Of course not every day with MongoDB is a sunshine day. Some tradeoffs we made are shared at the end of this post.

  1. I’m sure Perfect Market would get great support from almost every NoSQL database vendor — that’s what I’ve always heard in this market segment.
  2. By great performance I’ll assume Perfect Market got the numbers they needed. While presented as the top reason for choosing MongoDB, I think this was more in line with: “considering these other features, is MongoDB’s performance good enough for us?”.

    MongoDB is not the fastest NoSQL database.

  3. Great features and ease of use. Nobody can deny that, at least at the first glance, MongoDB’s feature set is very compelling. And they’ve absolutely nailed the user experience part.

    My hypothesis for MongoDB’s adoption rate has always been that it’s mostly due to it looking familiar to people with relational db experience and also removing most of the strict constraints of these. This is echoed in this post too:

    Althought MongoDB is a NoSQL document DBMS, it bears resemblance to RDBMS’s.

Original title and link: 4 Reasons Perfect Market chose MongoDB (NoSQL database©myNoSQL)


How SQL-on-JSON analytics bolstered a business

Alex Woodie (Datanami) reporting about BitYota a SQL-based data warehouse on top of JSON:

BitYota says it designed its own hosted data warehouse from scratch, and that it’s differentiated by having a JSON access layer atop the data store. “We have some uniqueness where we operate SQL directly on JSON,” says BitYota CEO Dev Patel. “We don’t need to translate that data into a structured format like a CSV. We believe that if you transform the data, you will lose some of the data quality. And once that’s transformed, you won’t get it back.”

✚ BitYota’s tagline is Analytics for mongoDB, so I assume it’s safe to say the backend is mongoDB and they are building a SQL layer on top of it. What flavor and what’s the behavior for SQL’s quirks would be a very interesting story.

✚ This related to my earlier Do all roads lead back to SQL?

Original title and link: How SQL-on-JSON analytics bolstered a business (NoSQL database©myNoSQL)


The birth and road ahead of TokuMX, the alternative MongoDB engine

While not a MongoDB user (or expert), I find Tokutek’s work on their alternative engine for MongoDB, TokuMX, quite interesting both for technical — what is currently broken in MongoDB — and business point of views — is the InnoDB model possible in the NoSQL space?, what are some possible outcomes of the alternative core technology for free products business model?, would a new product bringing together MongoDB’s missing features and combining them with MongoDB’s “friendliness” and product marketing still lead to a successful product?, etc.

Zardosht Kasheff’s post about the history of TokuMX and how the decision was made to pursue this direction brings some light to both these areas.

But really, the BIGGEST benefit to this approach was the following: we could innovate on more of the MongoDB core server stack in ways the other approaches would not allow. Prior to TokuMX 1.4, such innovations include (but are not limited to):

  • Document level locking
  • Multi-statement transactions (on non-sharded clusters)
  • MVCC snapshot query semantics
  • Clustering indexes (although, to be fair, this was possible in other approaches)
  • Dramatically reduced I/O utilization on secondaries (which we will elaborate on in a future post)
  • Fast bulk loading
  • Enterprise hot backup

For these reasons, we chose this option, and after some hard work, TokuMX was born.

Original title and link: The birth and road ahead of TokuMX, the alternative MongoDB engine (NoSQL database©myNoSQL)


The Hadoop as ETL part in migrating from MongoDB to Cassandra at FullContact

While I’ve found the whole post very educative — and very balanced considering the topic — the part that I’m linking to is about integrating MongoDB with Hadoop. After reading the story of integrating MongoDB and Hadoop at Foursquare, there were quite a few questions bugging me. This post doesn’t answer any of them, but it brings in some more details about existing tools, a completely different solution, and what seems to be an overarching theme when using Hadoop and MongoDB in the same phrase:

We’re big users of Hadoop MapReduce and tend to lean on it whenever we need to make large scale migrations, especially ones with lots of transformation. That fact along with our existing conversion project from before, we used 10gen’s mongo-hadoop project which has input and output formats for Hadoop. We immediately realized that the InputFormat which connected to a MongoDB cluster was ill-suited to our usage. We had 3TB of partially-overlapping data across 2 clusters. After calculating input splits for a few hours, it began pulling documents at an uncomfortably slow pace. It was slow enough, in fact, that we developed an alternative plan.

You’ll have to read the post to learn how they’ve accomplished their goal, but as a spoiler, it was once again more of an ETL process rather than an integration.

✚ The corresponding HN thread; it’s focused mostly on the from MongoDB to Cassandra parts.

Original title and link: The Hadoop as ETL part in migrating from MongoDB to Cassandra at FullContact (NoSQL database©myNoSQL)


Mapping relational databases terms and SQL to MongoDB

A tuts+ guide to MongoDB for people familiar with SQL and relational databases:

We will start with mapping the basic relational concepts like table, row, column, etc and move to discuss indexing and joins. We will then look over the SQL queries and discuss their corresponding MongoDB database queries.

By the end of it you’ll probably not be able to convert your app to MongoDB, but at the next meetup or hackaton you’ll have an idea of what those Mongo guys are talking about.

Original title and link: Mapping relational databases terms and SQL to MongoDB (NoSQL database©myNoSQL)