NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Redis: All content tagged as Redis in NoSQL databases and polyglot persistence

Redis High Availability and Automatic Failover: Redis Sentinel

After posting about Spreecast’s Redis High Available/Failover solution based on ZooKeeper where I referred to Redis Sentinel, I realized I haven’t linked to Salvatore Sanfilippo’s post about the design of Redis Sentinel:

It is a distributed monitoring system for Redis. On top of the monitoring layer it also implements a notification system with a simple to use API, and an automatic failover solution.

Well, this is a pretty cold description of what Redis Sentinel is. Actually it is a system that also tries to make monitoring fun! In short you have this monitoring unit, the Sentinel. The idea is that this monitoring unit is extremely chatty, it speaks the Redis protocol, and you can ask it many things about how it is seeing the Redis instances it is monitoring, what are the attached slaves, what the other Sentinels monitoring the same system and so forth. Sentinel is designed to interact with other programs a lot.

The official Redis Sentinel documentation is available too here. Salvatore Sanfilippo is actively working on Redis Sentinel and while it is not complete yet, there are already users trying it out. Redis Sentinel will be stable in a few weeks and will be released as part of the Redis 2.8. In case you’ll want to start using it before 2.8 becomes available, use the git unstable branch

Original title and link: Redis High Availability and Automatic Failover: Redis Sentinel (NoSQL database©myNoSQL)

Redis Failover at Spreecast Based on Apache ZooKeeper

Until Redis Sentil becomes generally available, Spreecast’s Ruby library using Apache ZooKeeper might be the solution for high availability Redis clusters:

We decided to address these concerns by creating the redis_failover gem, which was recently released as open-source for Ruby environments employing Redis. The redis_failover gem aims to be a drop-in replacement for the existing Ruby client for Redis. redis_failover is equipped with the capability to recognize and handle an automatic (or manual) failover gracefully. The client knows to automatically direct write operations to the current master and read operations to one of N slaves. redis_failover is built on top of ZooKeeper, a proven distributed configuration and notification system that handles pushing changes to nodes across the network. We decided to use ZooKeeper since it automatically handles network partitions, quorum management, client discovery, and other difficult distributed computing problems.

redis_failover architecture

Original title and link: Redis Failover at Spreecast Based on Apache ZooKeeper (NoSQL database©myNoSQL)


From S3 to CouchDB and Redis and Then Half Way Back for Serving Ads

The story of going form S3 to CouchDB and Redis and then back to S3 and Redis for ad serving:

The solution to this situation has a touch of irony. With Redis in place, we replaced CouchDB for placement- and ad-data with S3. Since we weren’t using any CouchDB-specific features, we simply published all the documents to S3 buckets instead. We still did the Redis cache warming upfront and data updates in the background. So by decoupling the application from the persistence layer using Redis, we also removed the need for a super fast database backend. We didn’t care that S3 is slower than a local CouchDB, since we updated everything asynchronously.

Besides the detailed blog post there’s also a slidedeck:

Original title and link: From S3 to CouchDB and Redis and Then Half Way Back for Serving Ads (NoSQL database©myNoSQL)


NoSQL and Relational Databases Podcast With Mathias Meyer

EngineYard’s Ines Sombra recorded a conversation with Mathias Meyer about NoSQL databases and their evolution towards more friendlier functionality, relational databases and their steps towards non-relational models, and a bit more on what polyglot persistence means.

Mathias Meyer is one of the people I could talk for days about NoSQL and databases in general with different infrastructure toppings and he has some of the most well balanced thoughts when speaking about this exciting space—see this conversation I’ve had with him in the early days of NoSQL. I strongly encourage you to download the mp3 and listen to it.

Original title and link: NoSQL and Relational Databases Podcast With Mathias Meyer (NoSQL database©myNoSQL)

Algorithm for Automatic Cache Invalidation

Jakub Łopuszański describes in much detail and with examples an algorithm for cache invalidation:

Imagine a bipartite graph which on the left hand side has one vertex per each possible subspace of a write query, and on the right side has vertices corresponding to subspaces of read queries. Actually both sets are equal, but we will focus on edges.

Edge goes from left to right, if a query on the left side affects results of a query on the right side. As said before, both sets are infinite, but that’s not the problem. There are infinitely many edges, but it’s also not bad. What’s bad is that there are nodes on the left side with the infinite degree, which means, we need to invalidate infinitely many queries. What the above tricky algorithm does, is adding a third layer to the graph, in the middle between the two, such that the transitive closure of the resulting graph is still the same (in other words: you can still get by using two edges anywhere you could by one edge in the original graph), yet each node on the left, and each node on the right, have finite (actually constant) degree. This middle layer corresponds to the artificial subspaces with “?” marks, and serves as a connecting hub for all the mess. Now, when a query on the left executes, it needs to inform only its (small number of) neighbours about the change, moving the burden of reading this information to the right. That is, a query on the right side needs to check if there is a message in the “inbox” in the middle layer. So you can think about it as a cooperation where the left query makes one step forward, and the right query does a one step back, to meet at the central place, and pass the important information about the invalidation of cache.

I’m still in front of a piece of paper understanding how it works.

Original title and link: Algorithm for Automatic Cache Invalidation (NoSQL database©myNoSQL)


Redis Bulk Insert

Redis-cli is getting a new pipe mode especially designed for bulk inserts as described in the documentation:

cat data.txt | redis-cli --pipe

The redis-cli utility will also make sure to only redirect errors received from the Redis instance to the standard output.

Original title and link: Redis Bulk Insert (NoSQL database©myNoSQL)


Lua Scripting in Redis 2.6

The top comments on Hacker News about Redis 2.6 RC are about Lua scripting.

Lua scripting! Once people figure out what Redis’ Lua scripting is good for, and it gets in a stable release, it’s going to set the world on fire. In a good way.

When one of the services I work on was having huge performance problems, and nothing I did seemed to make it fast enough, I realized that the main data-manipulation logic — previously a combination of Python and SQL — could be rewritten as a Lua script in Redis. I learned the basics of Lua in about an hour, migrated the data over to Redis, made the necessary changes to the code, and everything worked beautifully. Months of crippling speed problems vanished in a single long day.

Redis 2.6 saved my ass. Now, when I need to store data, it’s always one of the first things to come to mind, since I know I can count on it to be fast, solid, and flexible enough to do all sorts of things.

Going through myNoSQL archives, the first mention of Lua scripting support in Redis is from May 2nd, 2011. Salvatore Sanfilippo already wrote of some advanced functionality that would be possible using it, but I expect many more ideas to come out once Redis 2.6 is released.

Original title and link: Lua Scripting in Redis 2.6 (NoSQL database©myNoSQL)

Apache Mod_redis


This Apache module uses a rule-based engine (based on regular expression parser) to map URLs to REDIS commands on the fly. It supports an unlimited number of rules and can match on the full URL and the request method (GET, POST, PUT or DELETE) to provide a very flexible option for defining a RESTful interface to REDIS.

Original title and link: Apache Mod_redis (NoSQL database©myNoSQL)

Why I Love NodeJS and Redis

Erwin van der Koogh:

All of this allows me, a fairly decent Java developer with hardly any Javascript skills, to solve real world problems in record time. And that’s why I love Node and Redis.

It’s perfect if it works for you. But please do not automatically generalize it.

Original title and link: Why I Love NodeJS and Redis | Erronis (NoSQL database©myNoSQL)


Another Redis-Based Queue for Python: Introducing RQ

Vincent Driessen creates RQ as an alternative to Celery inspired by Resque:

I wanted a solution that was lightweight, easy to adopt, and easy to grasp. So I devised a simple queueing library for Python, and dubbed it RQ.

Welcome to the world of a thousand Redis-based queues.

Original title and link: Another Redis-Based Queue for Python: Introducing RQ (NoSQL database©myNoSQL)


Automatic Async and Sync Pipelining of Redis Commands

The nuts and bolts of implementing synchronous and asynchronous Redis clients supporting pipelining:

In this post I describe different approaches for client-libraries to implement Redis protocol pipelining. I will cover synchronous as well as asynchronous (event-driven) techniques and discuss their respective pros and cons: Synchronous client APIs require the library user to explicitly pipeline commands, potentially yielding optimal protocol performance, but at the cost of additional bookkeeping when handling replies. Asynchronous client libraries, on the other hand, allow automatic pipelining, while being less efficient in their pipelining behavior.

Original title and link: Automatic Async and Sync Pipelining of Redis Commands (NoSQL database©myNoSQL)


NoSQL Databases Adoption in Numbers

Source of data is Jaspersoft NoSQL connectors downloads. RedMonk published a graphic and an analysis and Klint Finley followed up with job trends:

NoSQL databases adoption

Couple of things I don’t see mentioned in the RedMonk post:

  1. if and how data has been normalized based on each connector availability

    According to the post data has been collected between Jan.2011-Mar.2012 and I think that not all connectors have been available since the beginning of the period.

  2. if and how marketing pushes for each connectors have been weighed in

    Announcing the Hadoop connector at an event with 2000 attendees or the MongoDB connector at an event with 800 attendeed could definitely influence the results (nb: keep in mind that the largest number is less than 7000, thus 200-500 downloads triggered by such an event have a significant impact)

  3. Redis and VoltDB are mostly OLTP only databases

Original title and link: NoSQL Databases Adoption in Numbers (NoSQL database©myNoSQL)