ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Membase Amazon SimpleDB MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

mongodb: All content tagged as mongodb in NoSQL databases and polyglot persistence

MongoDB Tips and Tricks: You Only Wish MongoDB Wasn't Relational

So when you read that MongoDB is a document store, you might get the wonderful idea to store your relationships in a big document. Since mongo lets you reach into objects, you can query against them, right?

Several times, we’ve excitedly begun a schema this way, only to be forced to pull the nested documents out into their own collection. I’ll show you why, and why it’s not a big deal.

I’m no MongoDB expert, but the suggested solution requires: 1) two network roundtrips; 2) an additional index.

The way I’d look at this problem is:

  1. what is the most frequent operation: reading a blog post and all its comments or displaying all the comments of a specific user?
  2. assuming the answer is reading a blog post and all its comments, I’d ask myself how frequent is the other operation.
    1. if it’s something that I need to perform just once in a while, I’d consider using MongoDB MapReduce, even if that would be suboptimal.
    2. if it’s a frequent operation, then I’d consider adding a separate collection for comments. Even better, I’d add a per user collection for comments.

Original title and link: MongoDB Tips and Tricks: You Only Wish MongoDB Wasn’t Relational (NoSQL database©myNoSQL)

via: http://seanhess.github.com/2012/02/01/mongodb_relational.html


NoSQL Market from Couchbase Perspective

James Philips (Couchbase) for Curt Monash:

  • MongoDB is the big competition. He believes Couchbase has an excellent win rate vs. 10gen for actual paying accounts.
  • DataStax/Cassandra wins over Couchbase only when multi-data-center capability is important. Naturally, multi-data-center capability is planned for Couchbase. (Indeed, that’s one of the benefits of swapping in CouchDB at the back end.)
  • Redis has “dropped off the radar”, presumably because there’s no particular persistence strategy for it.
  • Riak doesn’t show up much.

I assume this is sort of a pre-sales/sales department 100k feet overview.

Original title and link: NoSQL Market from Couchbase Perspective (NoSQL database©myNoSQL)

via: http://www.dbms2.com/2012/02/01/couchbase-update/


MongoDB Tips & Tricks: Using MongoDB ObjectIds as created-on timestamps

One of my favorite MongoDB tricks is the ability to use an ObjectId (the default type for MongoDB’s _id primary key) as a timestamp for when a document was created. Here’s how it works:

  $ import pymongo
  $ db = pymongo.Connection().test
  $ db.test.insert({'hello': 'world'})
  ObjectId('4f202e64e6fb1b56ff000000')
  $ doc = db.test.find_one()
  $ doc['_id'].generation_time
  datetime.datetime(2012, 1, 25, 16, 31, 32, tzinfo

Mike Dirolf used to work for 10gen so he probably knows quite a few such MongoDB tips & tricks.

Original title and link: MongoDB Tips & Tricks: Using MongoDB ObjectIds as created-on timestamps (NoSQL database©myNoSQL)

via: http://blog.fiesta.cc/post/16470048697/using-and-abusing-mongodb-objectids-as-created-on


Getting off the CouchDB... or Lessons Learned while Experimenting in Production

The move to CouchDB went well. Pages in our web application that would occasionally time out were now loading in a couple of seconds. And, our MySQL database was much, much happier. We liked CouchDB so much that we started planning a feature that would make heavy use of CouchDB’s schema-less nature.

And that’s when the wheels came off.

Word of caution: this is not the “CouchDB sucks so we went with MongoDB” type of post. It’s more of “we thought CouchDB can solve one of our problems, but then got confused and thought it can solve world hunger. So we decided to throw a bunch of data to it to see if it sticks. Surprise! It didn’t.”

Just to be clear, I’m not defending CouchDB and everything John Wood writes about it is correct. It’s just that experimenting with CouchDB in a non-production environment or at least reading myNoSQL would have already offered all those answers.

Original title and link: Getting off the CouchDB… or Lessons Learned while Experimenting in Production (NoSQL database©myNoSQL)

via: http://blog.signalhq.com/2012/01/24/getting-off-the-couchdb/


MongoDB Indexing in Practice

An article based on Kyle Banker’s MongoDB in Action:

Indexes are enormously important. With the right indexes in place, MongoDB can use its hardware efficiently and serve your application’s queries quickly. With the wrong indexes, you’ll see the exact opposite effect: slow queries and poorly utilized hardware. It stands to reason, then, that anyone wanting to use MongoDB effectively and make the best use of hardware resources must understand indexing. We’re going to look at some refinements on the kinds of indexes that can be created in MongoDB. We’ll then proceed to some of the niceties of administering those indexes.

While pretty detailed, the part I haven’t seen mentioned in this article is that MongoDB indexes are stored using memory mapped files (same mechanism as for storing data). Basically this means that your data and all your indexes are all competing for your system memory.

Original title and link: MongoDB Indexing in Practice (NoSQL database©myNoSQL)

via: http://www.cloudcomputingdevelopment.net/mongodb-indexing-in-practice/


MongoDB at GOV.UK: The Power of the Document Model

The alpha version of GOV.UK was using MySQL and PostgreSQL. GOV.UK beta is based on Amazon RDS (MySQL) and MongoDB. In there words:

We started out building everything using MySQL but moved to MongoDB as we realised how much of our content fitted its document-centric approach. Over time we’ve been more and more impressed with it and expect to increase our usage of it in the future.

Here’s how GOV.UK architecture looks like:

GOV.UK architecture

Credit OReilly .

Original title and link: MongoDB at GOV.UK: The Power of the Document Model (NoSQL database©myNoSQL)


NoSQL Tutorial: Setting Up a Hadoop Cluster with MongoDB Support on EC2

A complete and detailed guide for setting up a Hadoop cluster using MongoDB by Arten Yankov. It uses the MongoDB Hadoop adapter mongo-hadoop , which provides input and output adapters, support for InputSplits, and write-only Pig.

What is covered in the tutorial:

  • Creating an AMI with the custom settings (installed hadoop and mongo-hadoop)
  • Launching a hadoop cluster on EC2
  • Adding more nodes to the cluster
  • Running some sample jobs

Original title and link: NoSQL Tutorial: Setting Up a Hadoop Cluster with MongoDB Support on EC2 (NoSQL database©myNoSQL)

via: http://artemyankov.com/post/16717104998/how-to-set-up-a-hadoop-cluster-with-mongo-support-on


MongoDB Tips and Tricks: More Reads Make For Faster Writes

The trick to lowering your lock percentage and thus having faster updates is to query the document you are going to update, before you perform the update. Querying before doing an upsert might seem counter intuitive at first glance, but it makes sense when you think about it.

The read ensures that whatever document you are going to update is in RAM. This means the update, which will happen immediately after the read, always updates the document in RAM, which is super fast. I think of it as warming the database for the document you are about to update.

This makes it sound like upserts and field level updates are just syntactic sugar in MongoDB, their real value being lost if the system is fetching full entries1 in memory for an update.


  1. Actually because MongoDB uses memory mapped files accessing an entry requires more I/O. 

Original title and link: MongoDB Tips and Tricks: More Reads Make For Faster Writes (NoSQL database©myNoSQL)

via: http://mongotips.com/b/lower-lock-and-number-of-slow-queries/


MongoDB Replica Sets and Sharding for GridFS as a Distributed File System

Contrary to many MongoDB deployments, we primarily use it for storing files in GridFS. We switched over to MongoDB after searching for a good distributed file system for years. Prior to MongoDB we used a regular NFS share, sitting on top of a HAST-device. That worked great, but it didn’t allow us to scale horizontally the way a distributed file system allows.

No doubt GridFS is a useful feature of MongoDB, but I’m pretty sure the experts in distributed file systems have better solutions for this—I just hope they’ll share it with us.

Update: Jeff Darcy1:

Yes, we do have better solutions for this particular kind of use case.  So do object/blob stores like Swift.  

Honestly, I don’t think the “searching for a good distributed filesystem” part is even credible. How can someone be that bad at finding readily available information?  For example, it’s easier to set up sharding and replication with GlusterFS than with MongoDB and GridFS, plus you’ll get striping and RDMA and generally better performance for this type of workload.  On top of all that, you won’t need to use special libraries to interface with it because it’s a regular POSIX filesystem.  Lastly, it’s not like there hasn’t been a lot of press about it.  Even considering their obvious FreeBSD bias and the fact that FreeBSD is weak in this area, the second i tem for “FreeBSD distributed filesystem” points to GlusterFS.  If they didn’t find it, they just didn’t look very hard before they reached for the New Shiny.  

It’s not just GlusterFS, either.  MogileFS might not be a real filesystem but it’s user space so it would probably run just fine in their environment - as would the aforementioned Swift.  I have more of a problem with the anti-Mongo haters than with Mongo itself, it’s wonderful that these guys found a Mongo-based solution that works for them, but it seems like a bit of an odd choice nonetheless.


  1. Jeff Darcy is a member of the gluster.org advisory board, and works on GlusterFS full time at Red Hat. He’s also the person I direct all my questions related to distributed file systems (and not only). 

Original title and link: MongoDB Replica Sets and Sharding for GridFS as a Distributed File System (NoSQL database©myNoSQL)

via: http://viktorpetersson.com/2012/01/29/notes-on-mongodb-gridfs-and-sharding-in-the-cloud/


PHP and MongoDB Tutorial

Derick Rethans’s1 slides are a good MongoDB tutorial for PHP developers covering most of the API.


Jelastic Database Marketshare: MySQL, MongoDB, MariaDB

Jelastic, a company offering a cloud platform for Java server hosting, has published some stats about the databases used by their over 7000 users:

Jelastic Database Marketshare

While it would be wrong to generalize these results to absolute database marketshare, it is interesting nonetheless to see that MongoDB is already outrunning PostrgeSQL being the second most used database and that CouchDB, which was added only one month ago, is already used by 5% of Jelastic’s users. MySQL detains the first position with over 40% users or differently put double the number of the second place (MongoDB).

These numbers would be even more interesting if they would account for some real usage stats like database sizes or query volumes.

Mat Keep

Original title and link: Jelastic Database Marketshare: MySQL, MongoDB, MariaDB (NoSQL database©myNoSQL)

via: http://blog.jelastic.com/2012/01/23/database-marketshare-january-2012/


Using MongoDB Replica Sets With Node.js on Microsoft Azure: NoSQL Tutorials

Mariano Vazquez explains how to configure MongoDB replica sets on Microsoft Azure and how that works:

  • MongoDB will run the native binaries on a worker role and will store the data in Windows Azure storage using Windows Azure Drive (basically a hard disk mounted on Azure Page blobs)
  • The good thing about using Azure Storage is that the data is georeplicated. It will also make backup easier because of the snapshot feature of blob storage (which is not a copy but a diff).
  • It will use the local hard disk in the VM (local resources in the Azure jargon) to store the log files and a local cache.
  • You can scale out to multiple Mongo Replica Sets by increasing the instance count of the MongoDB role

Original title and link: Using MongoDB Replica Sets With Node.js on Microsoft Azure: NoSQL Tutorials (NoSQL database©myNoSQL)

via: http://nodeblog.cloudapp.net/running-mongodb-on-azure-and-connect-from-a-nodejs-web-app