usecase: All content tagged as usecase in NoSQL databases and polyglot persistence
Martin Schneider (Basho) trying to answer the question in the title:
Riak can be a data store to a purpose-built enterprise app; a caching layer for an Internet app, or part of the distributed fabric and DNA of a Global app. Those are of course highly arbitrary and vague examples, but it shows how flexible Riak is as a platform.
“Can be” is not quite equivalent with being the right solution and less so with being the best solution. And Martin’s answer to this is:
For super scalable enterprise and global apps — those where the data inside is inherently valuable and dependability of the system to capture, process and store data/writes is imperative — well I see Riak outperforming any perceived competitor in the space in providing value here.
But even for these scenarios, there’s competition from solutions like Cassandra, HBase, and Hypertable — the whole spectrum of scalable storage solutions based on Google BigTable and Amazon Dynamo being covered: HBase (a BigTable implementation), Cassandra (a solution using the BigTable data model and the Dynamo distributed model), and Riak (a solution based mainly on the Amazon Dynamo paper).
While Riak presents itself as the cleanest Dynamo based solution, I would venture to say that both Cassandra and HBase come to table with some interesting characteristics that cannot be ignored:
- Strong communities and community driven development processes — both HBase and Cassandra are top Apache Foundation projects
- Excellent integration with Hadoop, the leading batch processing solution. DataStax, the company offering services for Cassandra, went the extra-mile of creating a custom Hadoop solution, Brisk, making this integration even better.
Bottom line, I don’t think we can declare a winner in this space and I believe all three solutions will stay around for a while competing for every scenario requiring dependability of the system to capture, process and store data.
One brief note about architecture: since it’s impractical to simply query the activity of 500 friends, there are two general approaches for building scalable news feeds:
- Fan-out on read (do these queries ahead of time and cache them)
- Fan-out on-write (write follower-specific copies of every activity so when a given user asks for a feed you can retrieve it in one, simple query)
And why Redis:
First off, why Redis? It’s fast, our data model allows us to store minimal data in each feed entry, and Redis’ data-types are pretty well suited for an activity feed. Lists might seem like an obvious choice and could work for a basic feed implementation, but we ended up using sorted sets for two reasons:
- If you use a timestamp as the score parameter, you can really easily query for feed items based on time
- You can easily get the union of multiple sorted sets in order to generate an aggregated “friend feed”
Just in case you needed yet another example of building analytics systems on top of MongoDB: Patrick Stokes’ presentation about how Buddy Media is implementing their entire platform analytics engine on MongoDB:
Here are some other MongoDB analytics solutions:
- Fast, asynchronous analytics with MongoDB
- MongoDB and site analytics
- MongoDB use case: site analytics, a reoccurring scenario
- Scalable event analytics with MongoDB and Ruby on Rails
- Tracking page views with MongoDB
Patrick Stokes: Buddy Media’s Chief Product Officer ↩
Nice little CouchApp: MapChat: