NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



document database: All content tagged as document database in NoSQL databases and polyglot persistence

MongoDB performance bottlenecks and optimization strategies for MongoDB

Mikita Manko goes through a list of bottlenecks in MongoDB and suggest different ways to alleviate the pain (when possible):

I will try to describe here all potential performance bottlenecks and possible solutions and tips for performance optimization, but first of all – You should to ensure that MongoDB was the right choice for your project.

Original title and link: MongoDB performance bottlenecks and optimization strategies for MongoDB (NoSQL database©myNoSQL)


Defending Mongodb

Siddharth Ravichandran:

My post however, aims at highlighting the areas where Mongodb works and how it performed brilliantly for us. As someone leading the engineering efforts for a shipping and logistics company I wasn’t too happy initially to see Mongodb being used as the primary datastore but after 2 years I’m more than sure that this was definitely the datastore for us. I’ve outlined areas that confused me when i first encountered them only to learn that they were actually invaluable features that were available to me.

I fail to understand how someone can defend the bad usage of locks or the now-not-anymore-default fire and forget behavior of a database. Maybe that’s why the original title is “Mongodb — not evil, just misunderstood”.

Original title and link: Defending Mongodb (NoSQL database©myNoSQL)


Tuning MongoDB Performance with MMS


At MongoLab we manage thousands of MongoDB clusters and regularly help customers optimize system performance. Some of the best tools available for gaining insight into our MongoDB deployments are the monitoring features of MongoDB Management Service (MMS). […] Here we focus primarily on the metrics provided by MMS but augment our analysis with specific log file metrics as well.

There’s definitely something you can learn from guys whose business is running MongoDB.

✚ I continue to be impressed with 10genMongoDB Inc.’s MMS service.

Original title and link: Tuning MongoDB Performance with MMS (NoSQL database©myNoSQL)


Does Meteor scale? Polling for changes in MongoDB

Jon James in an article looking at the scalability of the Meteor framework, which uses MongoDB as its database:

Meteor is all real-time, which it currently achieves by fetching and comparing documents after every database write operation. Meteor also polls the database for changes every 10 seconds. These are the main bottlenecks when scaling Meteor, and they introduce two main issues:

  1. The polling and comparing logic takes a lot of CPU power and network I/O.
  2. After a write operation, there is no way to propagate changes to other Meteor instances in real-time. Changes will only be noticed the next time Meteor polls (~10 seconds).

While from a developer’s perspective getting automatic updates is probably a pretty cool feature, polling a database is not going to get them very far. The author suggests using MongoDB’s opslog as a source of changes. That could work.

Original title and link: Does Meteor scale? Polling for changes in MongoDB (NoSQL database©myNoSQL)


The SSL performance overhead in MongoDB and MySQL

How to use MongoDB with SSL:

As you can see the SSL overhead is clearly visible being about 0.05ms slower than a plain connection. The median for the inserts with SSL is 0.28ms. Plain connections have a median at around 0.23ms. So there is a performance loss of about 25%. These are all just rough numbers. Your mileage may vary.

Then 2 posts on “MySQL Performance Blog“: SSL Performance Overhead in MySQL and MySQL encryption performance, revisited:

Some of you may recall my security webinar from back in mid-August; one of the follow-up questions that I was asked was about the performance impact of enabling SSL connections. My answer was 25%, based on some 2011 data that I had seen over on yaSSL’s website, but I included the caveat that it is workload-dependent, because the most expensive part of using SSL is establishing the connection.

These 2 articles are diving much deeper and more scientifically into the impact of using SSL with MySQL. The results are interesting and the recommendations are well worth spending the time reading them.

Original title and link: The SSL performance overhead in MongoDB and MySQL (NoSQL database©myNoSQL)

RavenDB: The Road to Release

Ayende Rahien shares with RavenDB’s community a 5 point plan for the future of RavenDB. One of these caught my eyes:

Second, we do acknowledge that we suffer from a typical blindness for how we approach RavenDB. Since we built it, we know how things are supposed to be, and that is how we usually test them. Even when we try to go for the edge cases, we are constrained by our own thinking. We are currently working on getting an external testing team to do just that. Actively work to make use of RavenDB in creative ways specifically to try to break it.

To say that testing a database is complicated is an understatement. Moreover so if it’s a distributed database.

Original title and link: RavenDB: The Road to Release (NoSQL database©myNoSQL)


How to interpret NoSQL funding rounds

Adam Fowler (Marklogic):

Looking solely at money raised it it is tempting to conclude that MongoDB is the most successful NoSQL vendor out there… It simply isn’t though. It’s a services company mostly, and one that doesn’t make much in software license. They’re simply louder than the rest.

Sounds bitter. Very bitter.

Original title and link: How to interpret NoSQL funding rounds (NoSQL database©myNoSQL)


Blame it on the database

The story of a famous failure:

Another sore point was the Medicare agency’s decision to use database software, from a company called MarkLogic, that managed the data differently from systems by companies like IBM, Microsoft and Oracle. CGI officials argued that it would slow work because it was too unfamiliar. Government officials disagreed, and its configuration remains a serious problem.


“We have not identified any inefficient and defective code,” a CGI executive responded in an email to federal project managers, pointing again to database technology that the Medicare agency had ordered it to use as the culprit, at least in part.

I’m not going to defend Marklogic. But this sounds so much as the archetype of a failure story:

  1. start by blaming the other contractors
  2. find the newest or less known technology used in the project
  3. point all fingers to it

Long time ago I’ve been in a similar project. Different country, different agencies, different contractors, but exactly the same story. It was in the early days of my career. But what I’ve learned at that time stuck with me and even if today it may sound like a truism, it’s still one of the big lessons: It’s not the technology. It’s the people. Always. And the money.

Original title and link: Blame it on the database (NoSQL database©myNoSQL)


A MongoDB data recovery tale

Excellent write up by MongoHQ’s Paul Rubin of a data recovery story:

The friend-​​of-​​a-​​friend had not been run­ning Mon­goDB with us, but had been run­ning Mon­goDB at a bud­get VPS host. Their data­base on the bud­get VPS host had worked fine, until the host had a hard­ware crash. And as is all too usual in these sto­ries, their self-​​hosted plat­form had no usable back­ups less than about six weeks old.

While this story might be helpful, I really wish you’ll never need it; but just in case bookmark it.

Original title and link: A MongoDB data recovery tale (NoSQL database©myNoSQL)


Efficient techniques for fuzzy and partial matching in MongoDB

A pretty interesting post by John Page:

This blogpost describes a number of techniques, in MongoDB, for efficiently finding documents that have a number of similar attributes to a supplied query whilst not being an exact match. This concept of “Fuzzy” searching allows users to avoid the risks of failing to find important information due to slight differences in how was entered.

In the presented incarnation, these 4 solutions are MongoDB specific, but some of them could be easily generalized.

Original title and link: Efficient techniques for fuzzy and partial matching in MongoDB (NoSQL database©myNoSQL)


Forbes Top 10 Most Funded Big Data Startups

  • MongoDB (formerly 10gen) $231m Document-oriented database
  • Mu Sigma $208m Data-Science-as-a-Service
  • Cloudera $141m Hadoop-based software, services and training
  • Opera Solutions $114 Data-Science-as-a-Service
  • Hortonworks $98 Hadoop-based software, services and training
  • Guavus $87 Big data analytics solution
  • DataStax $83.7 Cassandra-based big data platform
  • GoodData $75.5 Cloud-based platform and big data apps
  • Talend $61.6 App and business process integration platform
  • Couchbase $56 Document-oriented database

I’m not really sure there are any conclusions one could make based only on this data.

Original title and link: Forbes Top 10 Most Funded Big Data Startups (NoSQL database©myNoSQL)


Scaling MongoDB at Mailbox

The story—a quite long and interesting one—of moving a MongoDB collection from one cluster to a new one:

While MongoDB allows you to add shards to a MongoDB cluster easily, we wanted to spare ourselves potential long-term pain by moving one of the most frequently updated MongoDB collections, which stores email-related data, to its own cluster. We theorized that this would, at a minimum, cut the amount of write lock contention in half. While we could have chosen to scale by adding more shards, we wanted to be able to independently optimize and administer the different types of data separately.

I’m not an ops person and I don’t know what the optimal process is. Hopefully readers will share their expectations.

Original title and link: Scaling MongoDB at Mailbox (NoSQL database©myNoSQL)