NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MongoDB Work Queues: Techniques to Easily Store and Process Complex Jobs

David Berube’s article debuts with a very good overview of the different approaches for creating and managing work queues:

There are many approaches to creating work queues. One option, though naive, is to use a relational database management system (RDBMS). This is simple to implement because many architectures already have a database system such as MySQL. However, performance is less than optimal compared with other approaches. The atomicity, consistency, isolation, and durability (ACID) compliance required for RDBMS is not necessary for this scenario and negatively impacts performance. A simpler system can perform better.

One system that has gained in popularity for this use is Redis. It’s a key-value data store, like the highly popular memcached, but with more features. For example, Redis has support for pushing and popping elements off lists in a highly scalable and efficient way. Resque, often used with Ruby on Rails, is a system built on top of Redis (see Resources for more details). However, Redis supports only simple primitives. You can’t insert complex objects into the lists, and it has relatively limited support for managing items in those lists.

Alternatively, many systems use a message broker such as Apache ActiveMQ or RabbitMQ. Although these systems are fast and scalable, they’re designed for simple messages. If you want to perform nontrivial reporting on your work queues or modify items in the queues, you are stuck because message brokers rarely offer those features. Fortunately, a powerful, scalable solution is available: MongoDB.

MongoDB allows you to create queues that contain complex nested data. Its locking semantics guarantee you won’t experience problems with concurrency, and its scalability ensures you can run large systems. Because MongoDB is a powerful relational database, you can also run robust reporting on your queue and prioritize by complex criteria. However, MongoDB is not a traditional RDBMS. For instance, it does not support Structured Query Language (SQL) queries.

MongoDB has many appealing features in addition to excellent performance for work queues, such as a flexible, schemaless approach. It supports nested data structures, meaning you can even store subdocuments. Because it is a more full-featured data store than Redis, it provides a richer set of management functions so you can easily view, query, update, and delete jobs on any arbitrary criteria.

Using MongoDB as a queueing system is in many regards as good and as wrong as using a relational database for this type of functionality. They completely lack the semantics and features required by both queues and pubsub. Redis (and obviously the dedicated MOMs) supports natively both queues and pubsub semantics.

So even if the article lists a couple of reasons why MongoDB could be used as a queuing system, consider this solution if and only if the only system you are allowed to run on your environment is MongoDB.

Original title and link: MongoDB Work Queues: Techniques to Easily Store and Process Complex Jobs (NoSQL database©myNoSQL)