NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



usecase: All content tagged as usecase in NoSQL databases and polyglot persistence

MongoDB Arrays for Social Likes and Follows

In social sites, you generally want to like something (comment, post, page, etc), or be a friend with someone. Now, MongoDB has arrays, which can be indexed are perfect for this we found out.

While most of these scenarios will work just fine, the one that can get a bit more complicated is handling highly concurrent likes/counters.

Original title and link: MongoDB Arrays for Social Likes and Follows (NoSQL databases © myNoSQL)


CouchDB: Flexible Forms and Data

Flexible forms and data with CouchDB (☞ here and ☞ here)

[…] at some point i got in need of some function to serialize forms into deep json objects so that i can push whatever the form has, directly to couchdb.

First time I’ve read about something similar was in NYTimes ☞ article about using MongoDB for storing both forms and their data:

Displaying a photo submission form now requires a single lookup. The form is stored as a document in a top-level collection, and the set of custom fields become embedded documents within that.

Original title and link: CouchDB: Flexible Forms and Data (NoSQL databases © myNoSQL)

MongoDB at RServe

Why schema free is important for RServe? Because we plan to support many business type. Different business type usually come with different data on reservation. We want to support RServe user to able embed custom data in their reservation data

Basically, RServe is using the schema-less MongoDB as a form of multi-tenancy.

Another reason for using document databases is to (try to) avoid the complexity of such schemas:

schema complexity

Original title and link: MongoDB at RServe (NoSQL databases © myNoSQL)


MongoDB Use Case: Archiving

Document-oriented databases, with their flexible schemas, provide a nice solution. We can have older documents which vary a bit from the newer ones in the archive. The lack of homogeneity over time may mean that querying the archive is a little harder. However, keeping the data is potentially much easier.

I think this is pushing the schema migration issue from data to code, which might actually be a good idea.

Original title and link: MongoDB Use Case: Archiving (NoSQL databases © myNoSQL)


Hadoop Usecase: Figting Spam in Big Data

A much more serious Hadoop use case:

Worldwide spam volumes this year are forecast to rise by 30% to 40% compared with 2009. Spam recently reached a record 92% of total email. Spammers have turned their attention to social media sites as well. In 2008, there were few Facebook phishing messages; Facebook is now the second most phished organization online. Even though Twitter has managed to recently bring its spam rate down to as low as 1%, the absolute volume of spam is still massive given its tens of millions of users. Dealing with spam introduces a number of Big Data challenges. The sheer size and scale of the data is enormous. In addition, spam in social media involves the need to understand very complex patterns of behavior as well as to identify new types of spam.


Make sure you check these 10 problems that can use Hadoop.

Original title and link: Hadoop Usecase: Figting Spam in Big Data (NoSQL databases © myNoSQL)

Document databases: 11 Document-oriented Applications

From Zef Hemel:

Some examples of document-oriented applications:

  • CRM
  • Contact Address/Phone Book
  • Forum/Discussion
  • Bug Tracking
  • Document Collaboration/Wiki
  • Customer Call Tracking
  • Expense Reporting
  • To-Dos
  • Time Sheets
  • E-mail
  • Help/Reference Desk

Looking at this list I’m like, what application is not document-oriented?

A partial answer to the last question is simple: all those that require highly connected data.

Original title and link: Document databases: 11 Document-oriented Applications (NoSQL databases © myNoSQL)


Flume Cookbook: Flume and Apache Logs

Part of the ☞ Flume cookbook:

In this post, we present a recipe that describes the common use case of using a Flume node collect Apache 2 web servers logs in order to deliver them to HDFS.

In case you want to (initially) skip Flume‘s user guide, you could start with this intro to Flume and then how does Flume and Scribe compare.

Original title and link: Flume Cookbook: Flume and Apache Logs (NoSQL databases © myNoSQL)


Redis at GitHub

From the InfoQ’s Werner Schuster interview with Scott Chacon:

Q: You mentioned using Redis. How do you use that?
A: We use Redis for exception handling and for our queue. We tried a lot of Ruby-based queuing mechanisms. Chris wrote an abstraction to the queuing mechanism. We used to use BJ and DJ and in the super early days we tried out Amazon SQS and a lot of queuing mechanisms and they all fell over at one point or another with the amount of traffic that we were doing on them and the types of queries that we were trying to get from them. Eventually we moved to a Redis space that Chris also wrote, called Resque.

That’s open source, you can get that on GitHub, a couple of other companies you use it but it’s Redis pack. We use the Redis list and stuff to queue up jobs and to pull the jobs out of that and it’s been really solid. If you are using DJ or something and it’s not working quite well for you, then you might want to check out Rescue.

GitHub is also using Redis for configuration management. And Redis queues is already a well known usecase.

Original title and link: Redis at GitHub (NoSQL databases © myNoSQL)


Hadoop: 10 Problems That Can Use Hadoop

Mike Pearce summarizing a presentation about problems where Hadoop can be a good fit:

  1. Modeling True Risk
  2. Customer Churn Analysis
  3. Recommendation engines
  4. Ad Targeting
  5. Point of Sale Transaction Analysis
  6. Analyzing Network Data to Predict Failure
  7. Thread Analysis/Fraud Detection
  8. Trade Surveillance
  9. Search Quality
  10. Data “Sandbox”

As you can see, most of these boil down to “Aggregate Data, Score Data, Present Score As Rank”, which, at it’s simplest, is what Hadoop can do.

If you need more ideas, just check the research published on the dating site OkCupid ☞ blog.

Original title and link for this post: Hadoop: 10 Problems That Can Use Hadoop (published on the NoSQL blog: myNoSQL)


MongoDB Use Case: Site Analytics, A Reoccurring Scenario

Remember Hummingbird, the MongoDB based real time web traffic visualization tool? And Eventbrite usage of MongoDB for page views tracking. And Yottaa’s scalable event analytics backed by MongoDB? This is how you’d describe why MongoDB is a good fit for this scenario:

I want to track a bunch of data for certain kinds of views and then display custom analytics. The data collected includes a combination of request environment and internal statistics correlated with request parameters. I did not want to write this to a traditional database for every request because:

  1. the data is adjunct to the functionality,
  2. it involves a select+insert or select+update for each request and
  3. writes are expensive. Furthermore, the write is not critical enough to hold up the request, and definitely not worth adding a queue infrastructure.

Original title and link for this post: MongoDB Use Case: Site Analytics, A Reoccurring Scenario (published on the NoSQL blog: myNoSQL)


Extensive Riak Benchmarking at Mozilla Test Pilot

Mozilla has previously published about their detailed plan and extensive investigation into Cassandra, HBase, and Riak that led to choosing Riak. This time they are publishing some extensive Riak benchmark results (against both Riak 0.10 and Riak 0.11 running Bitcask) — they are using Riak benchmarking code, included in the list of correct NoSQL benchmarks and performance evaluations solutions. Both the results, their analysis , and interpretation are fascinating.

Our goal in running these studies was, simply put, no surprises. That meant we needed to run studies to that profiled:

  1. Latency
  2. Stability, especially for long running tests
  3. Performance when we introduced variable object sizes
  4. Performance when we introduced pre-commit hooks to evaluate incoming data

I guess Mozilla Test Pilot is one of the Riak’s most interesting case studies.

Original title and link for this post: Extensive Riak Benchmarking at Mozilla Test Pilot (published on the NoSQL blog: myNoSQL)

Too Much Redis?

Ben Curtis ☞ thinks that using Redis for managing friends list as described in the ☞ EngineYard post is overly complicated:

Yesterday I read a post over at the EngineYard blog about a use case for Redis (in the name of being a polyglot, trying new things, etc.), and I just had to scratch my head. I love Redis — it rocks my world — but that example was too much for me. If you just want to store a set of ids somewhere to avoid normalization headaches, introducing Redis is overkill… just do it in MySQL!

He goes on and proposes a MySQL solution in which friends IDs are serialized as a comma separated list. Frankly speaking, I do see quite a few advantages Redis has compared to this one:

  1. Redis knows how to handle sets
    1. you don’t have to deal with de-duplication
    2. (most probably) the storage is optimized
  2. with manual serialization you’ll have to deal with all concurrency issues occurring when updating these lists

So what is the advantage of Ben’s suggested solution?

Original title and link for this post: Too Much Redis? (published on the NoSQL blog: myNoSQL)