NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



rabbitmq: All content tagged as rabbitmq in NoSQL databases and polyglot persistence

Apache Kafka: Next generation distributed messaging system

Abhishek Sharma in an 3000 words article on InfoQ:

Its architecture consists of the following components:

  • A stream of messages of a particular type is defined as a topic. A Message is defined as a payload of bytes and a Topic is a category or feed name to which messages are published.
  • A Abhishek Sharma can be anyone who can publish messages to a Topic.
  • The published messages are then stored at a set of servers called Brokers or Kafka Cluster.
  • A Consumer can subscribe to one or more Topics and consume the published Messages by pulling data from the Brokers.

Producer can choose their favorite serialization method to encode the message content. For efficiency, the producer can send a set of messages in a single publish request. Following code examples shows how to create a Producer to send messages.

Kafka is an amazing system. I just wish the article would have actually looked into what makes it unique and how it compares to systems like RabbitMQ or ActiveMQ.

✚ Cameron Purdy in one of the comments:

If you carefully read the article, you’ll note that Kafka is not actually a message queue. It’s just a specialized database with some messaging semantics in its API. That means if you need the behaviors that you would associate with a message queue, you can’t get them with Kafka (or if you can, the performance will plummet.)

Original title and link: Apache Kafka: Next generation distributed messaging system (NoSQL database©myNoSQL)


MongoDB Work Queues: Techniques to Easily Store and Process Complex Jobs

David Berube’s article debuts with a very good overview of the different approaches for creating and managing work queues:

There are many approaches to creating work queues. One option, though naive, is to use a relational database management system (RDBMS). This is simple to implement because many architectures already have a database system such as MySQL. However, performance is less than optimal compared with other approaches. The atomicity, consistency, isolation, and durability (ACID) compliance required for RDBMS is not necessary for this scenario and negatively impacts performance. A simpler system can perform better.

One system that has gained in popularity for this use is Redis. It’s a key-value data store, like the highly popular memcached, but with more features. For example, Redis has support for pushing and popping elements off lists in a highly scalable and efficient way. Resque, often used with Ruby on Rails, is a system built on top of Redis (see Resources for more details). However, Redis supports only simple primitives. You can’t insert complex objects into the lists, and it has relatively limited support for managing items in those lists.

Alternatively, many systems use a message broker such as Apache ActiveMQ or RabbitMQ. Although these systems are fast and scalable, they’re designed for simple messages. If you want to perform nontrivial reporting on your work queues or modify items in the queues, you are stuck because message brokers rarely offer those features. Fortunately, a powerful, scalable solution is available: MongoDB.

MongoDB allows you to create queues that contain complex nested data. Its locking semantics guarantee you won’t experience problems with concurrency, and its scalability ensures you can run large systems. Because MongoDB is a powerful relational database, you can also run robust reporting on your queue and prioritize by complex criteria. However, MongoDB is not a traditional RDBMS. For instance, it does not support Structured Query Language (SQL) queries.

MongoDB has many appealing features in addition to excellent performance for work queues, such as a flexible, schemaless approach. It supports nested data structures, meaning you can even store subdocuments. Because it is a more full-featured data store than Redis, it provides a richer set of management functions so you can easily view, query, update, and delete jobs on any arbitrary criteria.

Using MongoDB as a queueing system is in many regards as good and as wrong as using a relational database for this type of functionality. They completely lack the semantics and features required by both queues and pubsub. Redis (and obviously the dedicated MOMs) supports natively both queues and pubsub semantics.

So even if the article lists a couple of reasons why MongoDB could be used as a queuing system, consider this solution if and only if the only system you are allowed to run on your environment is MongoDB.

Original title and link: MongoDB Work Queues: Techniques to Easily Store and Process Complex Jobs (NoSQL database©myNoSQL)


Creating a MongoDB Queuing System

The Boxed Ice guys have posted a very detailed article about the requirements, implementation, plus some pros and cons of building a custom queuing system on top of MongoDB. Embedded below is a slidedeck giving an overview, but I’d encourage you to read the whole post:

In case like you are wondering why the original title is “Replacing RabbitMQ with MongoDB” and the real reasons that led the guys to this migration, the answer is briefly mentioned at the end of the post and in an older post:

  • All of the PHP AMQP libraries we tried were too unstable – either they didn’t work at all or they worked in low load testing but at high loads, performed terribly. We saw excellent response times and then suddenly there would be a massive spike and the entire postbacks process would hang. This appeared to be being caused by the connection pool resetting and every insert trying to recreate its connection.
  • Fault tolerance – replica sets in MongoDB work extremely well to allow automatic failover and redundancy. The original reason we removed RabbitMQ was no in-built support for redundancy across multiple nodes/data centres. This is now possible as of RabbitMQ 2.6.0.

Original title and link: Creating a MongoDB Queuing System (NoSQL database©myNoSQL)

Open-Source VoIP Cloud Services with Erlang

There’s a bit of CouchDB in the project:

We’ve built an open-source product that automatically deploys, scales and distributes VoIP calls across the Internet on commodity or virtualized servers. It fully utilizes Erlang for VoIP logic as well as relies on other Erlang products like CouchDB and RabbitMQ. It’s got an awesome set of APIs and some other nifty features.

Original title and link: Open-Source VoIP Cloud Services with Erlang (NoSQL databases © myNoSQL)

Fun with the CouchDB _changes feed and RabbitMQ

Streaming notifications about CouchDB document updates using RabbitMQ.

What could it be used for? My first thought is some sort of parallel computation, boot up a few dozen EC2 nodes and start dumping data into CouchDB.

or it is just about having fun with some cool projects …

Update: I also found a presentation on a similar topic that I’ve embedded below RabbitMQ + CouchDB = Awesome by Lenz Gschwendtner.My impression is that the presented usecase works the other way around though (nb RabbitMQ pushing messages to CouchDB instead of RabbitMQ disseminating CouchDB messages

RabbitMQ + CouchDB = Awesome

Update: Based on this picture from Jan Lehnardt (@janl), I’d say there are a couple of _changes fanboys!

_changes fanboys