NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



redis: All content tagged as redis in NoSQL databases and polyglot persistence

MongoDB's TTL Collections in OpenStack's Marconi queuing service

Flavio Percoco describing some workaround OpenStack’s queing system is when using MongoDB’s TTL collections:

Even though it is a great feature, it wasn’t enough to cover Marconi’s needs since the later supports per message TTL. In order to cover this, one of the ideas was to implement something similar to Mongodb’s thread and have it running server-side but we didn’t want that for a couple of reasons: it needed a separated thread / process and it had a bigger impact in terms of performance.

This got me thinking it might be one of the (few) features missing from Redis.

✚ Redis supports timeouts for keys. Redis 2.6 brought the accuracy of expiring keys from 1 second to 1 millisecond.

✚ Redis has support for different data structures like lists, sets, and sorted sets. But it’s missing the combination of the two.

Original title and link: MongoDB’s TTL Collections in OpenStack’s Marconi queuing service (NoSQL database©myNoSQL)


MongoDB Pub/Sub With Capped Collections

Rick Copeland designs a MongoDB Pub/Sub system based on:

  • MongoDB’s capped collections,
  • tailable data-awaiting cursors,
  • sequences (using find_and_modify()),
  • a “poorly documented option” of capped collections: oplog_replay1.

If you’ve been following this blog for any length of time, you know that my NoSQL database of choice is MongoDB. One thing that MongoDB isn’t known for, however, is building a publish / subscribe system. Redis, on the other hand, is known for having a high-bandwith, low-latency pub/sub protocol. One thing I’ve always wondered is whether I can build a similar system atop MongoDB’s capped collections, and if so, what the performance would be. Read on to find out how it turned out…

The solution is definitely ingenious and it could probably work for systems with not so many requirements for their pub/sub. It’s definitely a good excercise in combining some interesting features of MongoDB (I like the capped collections and the tailable data-awaiting cursors).

✚ I’m wondering if the behavior of the tailable data-awaiting cursors is the one of the non-blocking polls.

  1. I don’t really understand how this works. 

Original title and link: MongoDB Pub/Sub With Capped Collections (NoSQL database©myNoSQL)


Bitly Forget Table - Building Categorical Distributions in Redis

In the comment thread of the post “Using Redis as an external index for surfacing interesting content“, Micha Gorelick pointed to a post covering a similar solution used at Bitly:

We store the categorical distribution as a set of event counts, along with a ‘normalising constant’ which is simply the number of all the events we’ve stored. […]

All this lives in a Redis sorted set where the key describes the variable which, in this case, would simply be bitly_country and the value would be a categorical distribution. Each element in the set would be a country and the score of each element would be the number of clicks from that country. We store a separate element in the set (traditionally called z) that records the total number of clicks stored in the set. When we want to report the categorical distribution, we extract the whole sorted set, divide each count by z, and report the result.

Storing the categorical distribution in this way allows us to make very rapid writes (simply increment the score of two elements of the sorted set) and means we can store millions of categorical distributions in memory. Storing a large number of these is important, as we’d often like to know the normal behavior of a particular key phrase, or the normal behavior of a topic, or a bundle, and so on.

The Bitly team has open sources their solution named Forget Table and the code is available on GitHub.

Original title and link: Bitly Forget Table - Building Categorical Distributions in Redis (NoSQL database©myNoSQL)


Now All Reads Come From Redis at YouPorn

Speaking of Redis as the primary data store, this post from Andrea reminded me of YouPorn usage of Redis:

Datastore is the most interesting part. Initially they used MySQL but more than 200 million of pageviews and 300K query per second are too much to be handled using only MySQL. First try was to add ActiveMQ to enqueue writes but a separate Java infrastructure is too expensive to be maintained Finally they add Redis in front of MySQL and use it as main datastore.

Now all reads come from Redis. MySQL is used to allow the building new sorted sets as requirements change and it’s highly normalized because it’s not used directly for the site. After the switchover additional Redis nodes were added, not because Redis was overworked, but because the network cards couldn’t keep up with Redis. Lists are stored in a sorted set and MySQL is used as source to rebuild them when needed. Pipelining allows Redis to be faster and Append-only-file (AOF) is an efficient strategy to easily backup data.

Original title and link: Now All Reads Come From Redis at YouPorn (NoSQL database©myNoSQL)


Using Redis as an External Index for Surfacing Interesting Content at Heyzap

Micah Fivecoate introduces a series of algorithms used at Heyzap for surfacing interesting content:

  1. currently popular
  2. hot stream
  3. drip stream
  4. friends stream

All of them are implemented using Redis ZSETs:

In all my examples, I’m using Redis as an external index. You could add a column and an index to your posts table, but it’s probably huge, which presents its own limitations. Additionally, since we only care about the most popular items, we can save memory by only indexing the top few thousand items.

Original title and link: Using Redis as an External Index for Surfacing Interesting Content at Heyzap (NoSQL database©myNoSQL)


Redis as the Primary Data Store

Courtney Couch describes in much detail the solution used to scale Redis. Including application-level sharding, replication, and persistency.

The web is abound with warnings and cautionary tales about going this route. There are horror stories about lost data, hitting memory limits, or people unable to effectively manage the data within Redis, so you might be wondering “What on earth were you thinking?!” So here is our story, why we decided to use Redis anyway, and how we overcame those issues.


Original title and link: Redis as the Primary Data Store (NoSQL database©myNoSQL)


Using Redis to Optimize MySQL Queries

I somehow missed this post from Flickr team describing their use of (app enforced) capped sorted sets in Redis as sort of a reduced optimized secondary index for MySQL:

[…] the bottleneck was not in generating the list of photos for your most recently active contact, it was just in finding who your most recently active contact was (specifically if you have thousands or tens of thousands of contacts). What if, instead of fully denormalizing, we just maintain a list of your recently active contacts? That would allow us to optimize the slow query, much like a native MySQL index would; instead of needing to look through a list of 20,000 contacts to see which one has uploaded a photo recently, we only need to look at your most recent 5 or 10 (regardless of your total contacts count)!

This is the first time I’m encountaring this approach where a NoSQL database is used not to provide directly the final data (usually in a denormalized format), but rather to optimize the access to the master of data. Basically this is a metadata layer optimizer. Neat!

Original title and link: Using Redis to Optimize MySQL Queries (NoSQL database©myNoSQL)


Redis as a Service and GarantiaData

Giovanni Bajo writing about companies offering Redis-as-a-Service and why GarantiaData offer is different:

Usually, Redis “on the cloud” (= as a service) offerings are not really good deals. Redis is easy to install and assuming you at least know what “AOF” is, it’s not even hard to tune for normal datasets. Backupping is also very easy. I don’t claim to be an expert and I have never managed very large Redis instances, but I don’t see this service offerings helping you much in that regard. You pay something like $120-$200/mo per one 1Gb instance, and you still need to manually handle failovers and migrations (and compare that with a 2Gb RAM SSD-based VPS, which is $20/mo).

The main point is that the Redis service offered by GarantiaData includes a scalable pre-sharding cluster version of Redis. Then there are other smaller advantages.

Original title and link: Redis as a Service and GarantiaData (NoSQL database©myNoSQL)


Jondis: A Python Manager for Redis Master/Slaves

Announced last week and available on GitHub:

Jondis is a pool for managing your redis master / slave setup that works with redis-py. Given a list of servers, Jondis will learn your topology and if the master server dies, will query the remaining servers to find out which one has been promoted to the master, and reconfigure itself to send requests to the new master instance.

Original title and link: Jondis: A Python Manager for Redis Master/Slaves (NoSQL database©myNoSQL)


Compressing Large Data Sets in Redis With Gzip

When publishing it, the post dropped the quote and my comments.

A long post analyzing different scenarios of compressing data stored in Redis using Gzip:

Year and a half ago, I was working with a software that used Redis as a buffer to store large sets of text data. We had some bottlenecks there.

One of them was related to Redis and the large amount of data, that we had there (large comparing to RAM amount). Since then, I’ve wanted to check if using Gzip would be a big improvement or would it be just a next bottleneck (CPU). Unfortunately I don’t have access to this software any more, that’s why I’ve decided to create a simple test case just to check this matter.

If what’s important is the speed, I think algorithms like snappy and lzo are a better fit. If data density is important, then Zopfli is probably a better fit.

Original title and link: Compressing Large Data Sets in Redis With Gzip (NoSQL database©myNoSQL)


Redis: You Shall Never Be Blamed

Mariano Valles with a story of Ruby, Redis and concurrency:

  • Concurrency issues and high loads are best friends.
  • When using unicorn or any other app server using forking to have multiple process, be careful, forks are process clones
  • Servers used as databases or willing to handle many incoming connections should be tuned accordingly: e.gprlimit —nofile 10000
  • Don’t rely on the GC for cleaning up your mess. It might work in Java, not so much in Ruby.

Original title and link: Redis: You Shall Never Be Blamed | (NoSQL database©myNoSQL)


Rackspace: BYOD to Your Preferred Storage

While Amazon Web Services approach is bring-your-own-data to our storage and processing solutions, Rackspace’s strategy seems to be “whatever popular NoSQL storage engine you like, we have your back. Just bring your data“.

Last month Rackspace bought MongoDB hosting provider ObjectRocket and now they acquired Exceptional Cloud Service which brings Redis hosting on board.

It’s difficult to say how well is Amazon’s strategy working as the company doesn’t do a lot to get their customers’ case studies out there—I still need to find a list of 10 companies that are using Amazon Dynamo. But this doesn’t mean a thing. On the other hand, I can see Rackspace’s strategy working and getting a lot of traction considering they’re looking after the most popular NoSQL tools.

✚ The Register writes about this acquisition too: Rackspace gobbles Exceptional Cloud Services for Redis smarts. I assume many others are asking the same question:

So, with Redis and MongoDB due to make their way into the Rackspace cloud proper, what other technologies are catching the web hoster turned cloud whisperer’s eyes?

Original title and link: Rackspace: BYOD to Your Preferred Storage (NoSQL database©myNoSQL)