NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



usecase: All content tagged as usecase in NoSQL databases and polyglot persistence

Where Riak Fits? Riak’s Sweetspot

Martin Schneider (Basho) trying to answer the question in the title:

Riak can be a data store to a purpose-built enterprise app; a caching layer for an Internet app, or part of the distributed fabric and DNA of a Global app. Those are of course highly arbitrary and vague examples, but it shows how flexible Riak is as a platform.

“Can be” is not quite equivalent with being the right solution and less so with being the best solution. And Martin’s answer to this is:

For super scalable enterprise and global apps — those where the data inside is inherently valuable and dependability of the system to capture, process and store data/writes is imperative — well I see Riak outperforming any perceived competitor in the space in providing value here.

But even for these scenarios, there’s competition from solutions like Cassandra, HBase, and Hypertable — the whole spectrum of scalable storage solutions based on Google BigTable and Amazon Dynamo being covered: HBase (a BigTable implementation), Cassandra (a solution using the BigTable data model and the Dynamo distributed model), and Riak (a solution based mainly on the Amazon Dynamo paper).

While Riak presents itself as the cleanest Dynamo based solution, I would venture to say that both Cassandra and HBase come to table with some interesting characteristics that cannot be ignored:

  1. Strong communities and community driven development processes — both HBase and Cassandra are top Apache Foundation projects
  2. Excellent integration with Hadoop, the leading batch processing solution. DataStax, the company offering services for Cassandra, went the extra-mile of creating a custom Hadoop solution, Brisk, making this integration even better.

Bottom line, I don’t think we can declare a winner in this space and I believe all three solutions will stay around for a while competing for every scenario requiring dependability of the system to capture, process and store data.

Original title and link: Where Riak Fits? Riak’s Sweetspot (NoSQL databases © myNoSQL)

Rate Limiting With Redis

Rate limiting can be an effective way of conserving resources and preventing automated or nefarious activities on your site.

The key issues to address when designing a solution are:

  1. How do we incorporate time given that it’s a continuous variable?
  2. How can we efficiently expire old data?
  3. How can we scale the solution so that it can handle many hundreds of subjects and/or actions per second?

Chris O’Hara explains how each of these has a good answer in Redis’ features.

Original title and link: Rate Limiting With Redis (NoSQL databases © myNoSQL)


Building Publish / Subscribe Apps with Tropo and Redis

I’m hearing quite often lately of Redis PUB/SUB replacing real queuing systems. Here is an example application:

Building Publish / Subscribe Apps with Tropo and Redis

There’s also a screencast and the code available on GitHub:

Original title and link: Building Publish / Subscribe Apps with Tropo and Redis (NoSQL databases © myNoSQL)


Activity Feeds with Redis

The how:

One brief note about architecture: since it’s impractical to simply query the activity of 500 friends, there are two general approaches for building scalable news feeds:

  1. Fan-out on read (do these queries ahead of time and cache them)
  2. Fan-out on-write (write follower-specific copies of every activity so when a given user asks for a feed you can retrieve it in one, simple query)

And why Redis:

First off, why Redis? It’s fast, our data model allows us to store minimal data in each feed entry, and Redis’ data-types are pretty well suited for an activity feed. Lists might seem like an obvious choice and could work for a basic feed implementation, but we ended up using sorted sets for two reasons:

  1. If you use a timestamp as the score parameter, you can really easily query for feed items based on time
  2. You can easily get the union of multiple sorted sets in order to generate an aggregated “friend feed”

Then the code in Ruby and PHP

Original title and link: Activity Feeds with Redis (NoSQL databases © myNoSQL)

Social Analytics on MongoDB

Just in case you needed yet another example of building analytics systems on top of MongoDB: Patrick Stokes’[1] presentation about how Buddy Media is implementing their entire platform analytics engine on MongoDB:

Here are some other MongoDB analytics solutions:

  1. Patrick Stokes: Buddy Media’s Chief Product Officer  

Original title and link: Social Analytics on MongoDB (NoSQL databases © myNoSQL)

Fast, asynchronous analytics with MongoDB

We needed to do simple analytics on OpenGovernment, but not of the Google Analytics variety. We needed each object in the system to have view count aggregates that we could show in real time on the page, and we needed to be able to pull top ten lists and stuff.

Alert: new market trend identified: no day without a new analytics system built with MongoDB. Google Analytics and other products in the market stand no chance.

Original title and link: Fast, asynchronous analytics with MongoDB (NoSQL databases © myNoSQL)


Rewriting the Redis Twitter Clone

The Redis Twitter clone app is showing its age:

I’m looking at the Twitter Clone and noticed a N + 1 -like “get” in the code […] The above code seems rather suboptimal, if my understanding is correct.

At least three better approaches have been suggested, so who is up for experimenting with Redis and rewriting this app to use latest Redis features?

  1. use pipelining to get all the posts in one server roundtrip (won’t change the code much and be much faster)
  2. use SORTGET semantics to get all the post data at once from the list of ids (should be somewhat faster than 1)
  3. Use MGET to get all the post data at once.

Original title and link: Rewriting the Redis Twitter Clone (NoSQL databases © myNoSQL)


Convore Usage of Redis Pub/Sub

Eric Florenzano describing the architecture of the newly launched Convore website:

Now a task is sent to Celery (by way of Redis) notifying it that this new message has been received. This Celery task now increments the unread count for everyone who has access to the topic that the message was posted in, and then it publishes to a Redis pub/sub for the group that the message was posted to. Finally, the task scans through the message, looking for any users that were mentioned in the message, and writes entries to the database for every mention.

On the other end of that pub/sub are the many open http requests that our users have initiated, which are waiting for any new messages or information. Those all simultaneously return the new message information, at which point they reconnect again, waiting for the next message to arrive.

Interestingly Convore’s architecture is one of a pretty classical web application for a group instant messaging solution.

Original title and link: Convore Usage of Redis Pub/Sub (NoSQL databases © myNoSQL)


Redis at Digg: Story View Counts

Digg just rolled out a new feature, cummulative page event counters (page views plus clicks), that is using Redis as its underlying solution.

Clickstream information is extracted real time from logs and then Redis’s support for incrementing values comes into play. And in case you are wondering how these counters deal with concurrent updates, keep in mind that Redis is a single threaded engine, so all operations are executed sequentially.

In Digg’s own words: “Redis rocks!”

Original title and link: Redis at Digg: Story View Counts (NoSQL databases © myNoSQL)


Hadoop and Membase Case Study: AOL Advertising Architecture

Combining Hadoop and Membase to solve these challenges:

  1. How to analyze billions of user-related events, presented as a mix of structured and unstructured data, to infer demographic, psychographic and behavioral characteristics that are encapsulated into hundreds of millions of “cookie profiles”
  2. How to make hundreds of millions of cookie profiles available to their ad targeting platform with sub-millisecond, random read latency
  3. How to keep the user profiles fresh and current

AOL Advertising Hadoop Membase Case Study

In a much simplified form:

  • crunch (nb: read it as pre-process and prepare) tons of data with Hadoop
  • feed the results in a low latency, high throughput key-value store for serving them online

Original title and link: Hadoop and Membase Case Study: AOL Advertising Architecture (NoSQL databases © myNoSQL)


Leaderboards using Redis: A How-To Guide

At the studio I had discussed with colleagues the possibility of using Redis, an advanced key-value storage engine, for leaderboards. In less than an hour, I had the set of Redis commands using their sorted set data type (a set of data that is sorted based on an associated “score”) to perform operations on leaderboards such as:

  • Retrieving general information about a leaderboard such as total members or total pages
  • Adding or removing members from a leaderboard
  • Retrieving information about a member in the leaderboard such as their rank or score
  • Updating score information for a member in the leaderboard
  • Retrieving an arbitrary page of leaders from the leaderboard
  • Retrieving the leaders around a given member in a leaderboard, also known as an “Around Me” leaderboard
  • Retrieving information for an arbitrary set of members in a leaderboard, e.g. How do my friends compare against me?

Not a gamer myself, but I can see this as an exercise of using Redis and experimenting with its data types and special operations support.

Original title and link: Leaderboards using Redis: A How-To Guide (NoSQL databases © myNoSQL)


Powered by CouchApp: MapChat

Nice little CouchApp: MapChat:

MapChat: CouchApp

Why CouchApp/CouchDB? I guess the answer in this case is: development simplicity, as there’s no need for additional web server or other things like that. Just CouchDB and a bunch of Javascript.

Original title and link: Powered by CouchApp: MapChat (NoSQL databases © myNoSQL)