Last week has featured two of the most interesting posts about MongoDB: first coming from Mathias Meyer (@roidrage) ☞ offline investigation of MongoDB and the second, a set of notes from running MongoDB in production published on ☞ Boxed Iced blog.
If you are interested in getting started with MongoDB, I’d encourage you take the time to go through Mathias’ post which covers the following aspects (I’ve also included a couple of comments)
- collections and capped collections
Note: I couldn’t really understand the usage of namespaces and the implication on indexes
- data format
Note: I’d also strongly suggest taking a look at MongoDB documentation on ☞ schema design for more details
Note: I’d really appreciate more details on this topic as it is not completely clear if all access (both read and writes) is serialized or just writes are serialized (or not?); also the impact on indexes is not clear either.
- protocol access
Note: probably biased, but I still wait for the moment MongoDB sharding would become at least beta.
MapReduce support seems to be missing from Mathias notes, but luckily we have that covered for you: MongoDB MapReduce tutorial.
While keeping in mind that some of these features are not unique to MongoDB and can be found in other systems, you should be ready to cross check your app requirements with the lessons learned by the guys at Boxed Ice:
- namespace limits
We split our customers across (currently) 3 MongoDB databases because there is a namespace limit of 24,000 per database. This is essentially the number of collections + number of indexes.
- initial sync/replication of large databases
Our databases are very large and it takes about 48-72 hours to fully sync all our current data onto a new slave in a different DC (via a site-to-site VPN for security). During this time you’re at risk because the slave is not up to date.
- initial sync “slows” things
When doing a fresh sync from a master to a slave, we have observed a “slowdown” in our application response times.
- index creation blocks
However, if you have an existing collection and create a new index on it then that process will block the database until the index is created.
- efficiency of reclaiming diskspace
We have found that there is a massive discrepancy between a master and a freshly copied slave.
Even if not every application will have to deal with the size Boxed Ice is dealing, I couldn’t stop noticing that parts of the process of scaling MongoDB were really painful. Or as Sergio Bossa (@sbtourist) put it in ☞ one of the comments:
Anyways, it seems indeed you had almost the same problems you would had with a MySQL solution:
- Huge data to deal with.
- Manual sharding.
- Sync/replication delays.
So why didn’t you evaluate to switch to a more “large-scale” nosql solution like Cassandra or Riak?