mongodb: All content tagged as mongodb in NoSQL databases and polyglot persistence
Friday, 3 February 2012
MongoDB Tips and Tricks: You Only Wish MongoDB Wasn't Relational
So when you read that MongoDB is a document store, you might get the wonderful idea to store your relationships in a big document. Since mongo lets you reach into objects, you can query against them, right?
Several times, we’ve excitedly begun a schema this way, only to be forced to pull the nested documents out into their own collection. I’ll show you why, and why it’s not a big deal.
I’m no MongoDB expert, but the suggested solution requires: 1) two network roundtrips; 2) an additional index.
The way I’d look at this problem is:
- what is the most frequent operation: reading a blog post and all its comments or displaying all the comments of a specific user?
- assuming the answer is reading a blog post and all its comments, I’d ask myself how frequent is the other operation.
- if it’s something that I need to perform just once in a while, I’d consider using MongoDB MapReduce, even if that would be suboptimal.
- if it’s a frequent operation, then I’d consider adding a separate collection for comments. Even better, I’d add a per user collection for comments.
Original title and link: MongoDB Tips and Tricks: You Only Wish MongoDB Wasn’t Relational (©myNoSQL)
via: http://seanhess.github.com/2012/02/01/mongodb_relational.html
Thursday, 2 February 2012
NoSQL Market from Couchbase Perspective
James Philips (Couchbase) for Curt Monash:
- MongoDB is the big competition. He believes Couchbase has an excellent win rate vs. 10gen for actual paying accounts.
- DataStax/Cassandra wins over Couchbase only when multi-data-center capability is important. Naturally, multi-data-center capability is planned for Couchbase. (Indeed, that’s one of the benefits of swapping in CouchDB at the back end.)
- Redis has “dropped off the radar”, presumably because there’s no particular persistence strategy for it.
- Riak doesn’t show up much.
I assume this is sort of a pre-sales/sales department 100k feet overview.
Original title and link: NoSQL Market from Couchbase Perspective (©myNoSQL)
MongoDB Tips & Tricks: Using MongoDB ObjectIds as created-on timestamps
One of my favorite MongoDB tricks is the ability to use an ObjectId (the default type for MongoDB’s _id primary key) as a timestamp for when a document was created. Here’s how it works:
$ import pymongo $ db = pymongo.Connection().test $ db.test.insert({'hello': 'world'}) ObjectId('4f202e64e6fb1b56ff000000') $ doc = db.test.find_one() $ doc['_id'].generation_time datetime.datetime(2012, 1, 25, 16, 31, 32, tzinfo
Mike Dirolf used to work for 10gen so he probably knows quite a few such MongoDB tips & tricks.
Original title and link: MongoDB Tips & Tricks: Using MongoDB ObjectIds as created-on timestamps (©myNoSQL)
via: http://blog.fiesta.cc/post/16470048697/using-and-abusing-mongodb-objectids-as-created-on
Getting off the CouchDB... or Lessons Learned while Experimenting in Production
The move to CouchDB went well. Pages in our web application that would occasionally time out were now loading in a couple of seconds. And, our MySQL database was much, much happier. We liked CouchDB so much that we started planning a feature that would make heavy use of CouchDB’s schema-less nature.
And that’s when the wheels came off.
Word of caution: this is not the “CouchDB sucks so we went with MongoDB” type of post. It’s more of “we thought CouchDB can solve one of our problems, but then got confused and thought it can solve world hunger. So we decided to throw a bunch of data to it to see if it sticks. Surprise! It didn’t.”
Just to be clear, I’m not defending CouchDB and everything John Wood writes about it is correct. It’s just that experimenting with CouchDB in a non-production environment or at least reading myNoSQL would have already offered all those answers.
Original title and link: Getting off the CouchDB… or Lessons Learned while Experimenting in Production (©myNoSQL)
via: http://blog.signalhq.com/2012/01/24/getting-off-the-couchdb/
MongoDB Indexing in Practice
An article based on Kyle Banker’s MongoDB in Action:
Indexes are enormously important. With the right indexes in place, MongoDB can use its hardware efficiently and serve your application’s queries quickly. With the wrong indexes, you’ll see the exact opposite effect: slow queries and poorly utilized hardware. It stands to reason, then, that anyone wanting to use MongoDB effectively and make the best use of hardware resources must understand indexing. We’re going to look at some refinements on the kinds of indexes that can be created in MongoDB. We’ll then proceed to some of the niceties of administering those indexes.
While pretty detailed, the part I haven’t seen mentioned in this article is that MongoDB indexes are stored using memory mapped files (same mechanism as for storing data). Basically this means that your data and all your indexes are all competing for your system memory.
Original title and link: MongoDB Indexing in Practice (©myNoSQL)
via: http://www.cloudcomputingdevelopment.net/mongodb-indexing-in-practice/
Wednesday, 1 February 2012
MongoDB at GOV.UK: The Power of the Document Model
The alpha version of GOV.UK was using MySQL and PostgreSQL. GOV.UK beta is based on Amazon RDS (MySQL) and MongoDB. In there words:
We started out building everything using MySQL but moved to MongoDB as we realised how much of our content fitted its document-centric approach. Over time we’ve been more and more impressed with it and expect to increase our usage of it in the future.
Here’s how GOV.UK architecture looks like:

Original title and link: MongoDB at GOV.UK: The Power of the Document Model (©myNoSQL)
Tuesday, 31 January 2012
NoSQL Tutorial: Setting Up a Hadoop Cluster with MongoDB Support on EC2
A complete and detailed guide for setting up a Hadoop cluster using MongoDB by Arten Yankov. It uses the MongoDB Hadoop adapter mongo-hadoop , which provides input and output adapters, support for InputSplits, and write-only Pig.
What is covered in the tutorial:
- Creating an AMI with the custom settings (installed hadoop and mongo-hadoop)
- Launching a hadoop cluster on EC2
- Adding more nodes to the cluster
- Running some sample jobs
Original title and link: NoSQL Tutorial: Setting Up a Hadoop Cluster with MongoDB Support on EC2 (©myNoSQL)
via: http://artemyankov.com/post/16717104998/how-to-set-up-a-hadoop-cluster-with-mongo-support-on
Monday, 30 January 2012
MongoDB Tips and Tricks: More Reads Make For Faster Writes
The trick to lowering your lock percentage and thus having faster updates is to query the document you are going to update, before you perform the update. Querying before doing an upsert might seem counter intuitive at first glance, but it makes sense when you think about it.
The read ensures that whatever document you are going to update is in RAM. This means the update, which will happen immediately after the read, always updates the document in RAM, which is super fast. I think of it as warming the database for the document you are about to update.
This makes it sound like upserts and field level updates are just syntactic sugar in MongoDB, their real value being lost if the system is fetching full entries1 in memory for an update.
-
Actually because MongoDB uses memory mapped files accessing an entry requires more I/O. ↩
Original title and link: MongoDB Tips and Tricks: More Reads Make For Faster Writes (©myNoSQL)
via: http://mongotips.com/b/lower-lock-and-number-of-slow-queries/
MongoDB Replica Sets and Sharding for GridFS as a Distributed File System
Contrary to many MongoDB deployments, we primarily use it for storing files in GridFS. We switched over to MongoDB after searching for a good distributed file system for years. Prior to MongoDB we used a regular NFS share, sitting on top of a HAST-device. That worked great, but it didn’t allow us to scale horizontally the way a distributed file system allows.
No doubt GridFS is a useful feature of MongoDB, but I’m pretty sure the experts in distributed file systems have better solutions for this—I just hope they’ll share it with us.
Update: Jeff Darcy1:
Yes, we do have better solutions for this particular kind of use case. So do object/blob stores like Swift.
Honestly, I don’t think the “searching for a good distributed filesystem” part is even credible. How can someone be that bad at finding readily available information? For example, it’s easier to set up sharding and replication with GlusterFS than with MongoDB and GridFS, plus you’ll get striping and RDMA and generally better performance for this type of workload. On top of all that, you won’t need to use special libraries to interface with it because it’s a regular POSIX filesystem. Lastly, it’s not like there hasn’t been a lot of press about it. Even considering their obvious FreeBSD bias and the fact that FreeBSD is weak in this area, the second i tem for “FreeBSD distributed filesystem” points to GlusterFS. If they didn’t find it, they just didn’t look very hard before they reached for the New Shiny.
It’s not just GlusterFS, either. MogileFS might not be a real filesystem but it’s user space so it would probably run just fine in their environment - as would the aforementioned Swift. I have more of a problem with the anti-Mongo haters than with Mongo itself, it’s wonderful that these guys found a Mongo-based solution that works for them, but it seems like a bit of an odd choice nonetheless.
-
Jeff Darcy is a member of the gluster.org advisory board, and works on GlusterFS full time at Red Hat. He’s also the person I direct all my questions related to distributed file systems (and not only). ↩
Original title and link: MongoDB Replica Sets and Sharding for GridFS as a Distributed File System (©myNoSQL)
via: http://viktorpetersson.com/2012/01/29/notes-on-mongodb-gridfs-and-sharding-in-the-cloud/
PHP and MongoDB Tutorial
Derick Rethans’s1 slides are a good MongoDB tutorial for PHP developers covering most of the API.
Monday, 23 January 2012
Jelastic Database Marketshare: MySQL, MongoDB, MariaDB
Jelastic, a company offering a cloud platform for Java server hosting, has published some stats about the databases used by their over 7000 users:

While it would be wrong to generalize these results to absolute database marketshare, it is interesting nonetheless to see that MongoDB is already outrunning PostrgeSQL being the second most used database and that CouchDB, which was added only one month ago, is already used by 5% of Jelastic’s users. MySQL detains the first position with over 40% users or differently put double the number of the second place (MongoDB).
These numbers would be even more interesting if they would account for some real usage stats like database sizes or query volumes.
Original title and link: Jelastic Database Marketshare: MySQL, MongoDB, MariaDB (©myNoSQL)
via: http://blog.jelastic.com/2012/01/23/database-marketshare-january-2012/
Thursday, 19 January 2012
Using MongoDB Replica Sets With Node.js on Microsoft Azure: NoSQL Tutorials
Mariano Vazquez explains how to configure MongoDB replica sets on Microsoft Azure and how that works:
- MongoDB will run the native binaries on a worker role and will store the data in Windows Azure storage using Windows Azure Drive (basically a hard disk mounted on Azure Page blobs)
- The good thing about using Azure Storage is that the data is georeplicated. It will also make backup easier because of the snapshot feature of blob storage (which is not a copy but a diff).
- It will use the local hard disk in the VM (local resources in the Azure jargon) to store the log files and a local cache.
- You can scale out to multiple Mongo Replica Sets by increasing the instance count of the MongoDB role
Original title and link: Using MongoDB Replica Sets With Node.js on Microsoft Azure: NoSQL Tutorials (©myNoSQL)
via: http://nodeblog.cloudapp.net/running-mongodb-on-azure-and-connect-from-a-nodejs-web-app
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling