NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL hybrid: All content tagged as NoSQL hybrid in NoSQL databases and polyglot persistence

Big Data and Emerging NoSQL Databases Shift to Hybrid Database Environments

IT decision-makers need to become familiar with the strengths and weaknesses of non-relational systems so they can make informed decisions as to their possible place in the IT infrastructure. The “one size fits all” RDBMS has made database technology decisions relatively easy; in a hybrid future, picking the right database tool may become more complex.

It usually takes a while for everyone to accept new technologies and understand their benefits. Sometimes it’s 2 years, sometimes is more. However, some of the NoSQL databases are here to stay. And Hadoop is completing the slope of enlightenment.

Original title and link: Big Data and Emerging NoSQL Databases Shift to Hybrid Database Environments (NoSQL database©myNoSQL)


wooga’s architecture: Facebook Games on MySQL and Redis

Tim Lossen’s presentation about the evolution of wooga’s architecture (Facebook games) from using sharded MySQL to a polyglot persistence solution based on master/slave Redis and master/slave MySQL, with pluses and minuses:

Original title and link: wooga’s architecture: Facebook Games on MySQL and Redis (NoSQL databases © myNoSQL)

Jarvis Architecture Using MongoDB for Asset Information

Jarvis architecture[1] powered by NoSQL:

Jarvis Architecture MongoDB

We chose a hybrid approach for data. We store user and organization data in MySQL but asset information is in MongoDB. This lets us have a traditional schema for organizations having many users and users that have many organizations. We chose a document database for our asset information because the “schema” we had in mind would be a hot mess in a relational database. The freedom of a Mongo enables some really great features which we’ll be unveiling over time.

  1. Jarvis: open source Perl-based web asset management system  

Original title and link: Jarvis Architecture Using MongoDB for Asset Information (NoSQL databases © myNoSQL)


Paper: Netflix’s Transition to High-Availability Storage Systems

A while ago, Sid Anand[1] has written a series of posts on challenges of a hybrid solution: Oracle - Amazon SimpleDB. This has become now a paper which offers a much better organized and detailed view on Netflix’s transition to using a hybrid Oracle - Amazon Web Services (SimpleDB, S3) architecture.

Go read the ☞ paper if one of these applies:

  • interested in Amazon SimpleDB and SimpleDB best practices
  • interested in running an on-premise and cloud hybrid architecture
  • interested in architecting a multi data source system

  1. Siddharth “Sid” Anand, Netflix cloud engineer, @r39132  ()

Original title and link: Paper: Netflix’s Transition to High-Availability Storage Systems (NoSQL databases © myNoSQL)

Redis Usecase: API Access Logger

Nice combination of Redis and MySQL:

Redis has to keep all stored objects in memory, so just putting all data in there and forgetting about it was out of the question. We decided to only keep a few days of data in Redis and archive the results to MySQL. Daily API usage stats would be served directly by Redis, archived results on date ranges would be fetched from MySQL.

Note also what correct Redis data modeling means: usage of Redis data structures combined with smart keys (nb smart in the sense of keys carrying additional meta-information).


Tekpub: Using both MongoDB and MySQL

You shouldn’t be afraid to use both NoSQL and RDBMS in your projects if they help you address real problems:

We split out the duties of our persistence story cleanly into two camps: reports of things we needed to know to make decisions for the business and data users needed to use our site. Ironically these two different ways of storing data have guided us to do what’s natural: put the application data into a high-read, high-availability environment (MongoDb) - put the historical, reporting data into a system that is built to answer questions: a relational data store.

The high-read stuff (account info, productions and episode info) is perfect for a “right now” kind of thing like MongoDb. The “what happened yesterday” stuff is perfect for a relational system.

We don’t want to run reports on our live server. You don’t know how long they will take - nor what indexing they will work over (limiting the site’s perf). Bad. Bad-bad.

Much better case study than this one!

This post is part of the MongoDB Case Studies series.


Presentation: Blending NoSQL and SQL at Confoo

Earlier today I wrote about the steps involved to migrate from MySQL to NoSQL. Anyways I do feel that in many cases NoSQL and RDBMS will live together under the same project umbrella. Michael Bleigh is covering this topic in his presentation: Blending NoSQL and SQL at Confoo:

MySQL and MongoDB Sitting In a Boat

An interesting post from lunar logic guys about using MySQL and MongoDB for their Kanban product, how that get there and the tools they are using.

As a personal note, I thought how this system would be characterized in terms of CAP. It should be quite clear that we cannot speak about consistency over the two systems as MongoDB doesn’t really support transactions (you can check these notes on MongoDB for more details). So, in case their system would be using master-master MySQL replication and replica-pairs for MongoDB, and the internal tools would know how to work with this setup, we could probably say that we have an AP system. But if any of these preconditions are not fulfilled, I’d say both A and P are lost.


Another NoSQL Friendly RDBMS, Plus Some Pros and Cons

Aside from pointing out to just another NoSQL friendly RDBMS postthese two plus the FriendFeed post were written quite a long time ago, I thought it would be interesting to include here what the guys over MySQL Performance blog consider as good situations for using this technique and its downsides:

Schema-less RDBMS Pros

  • If the application really is schema-less and has a lot of optional parameters that do not appear in every record, serializing the data in one column can be a better idea than having many extra columns that are NULL.
  • when you update the text/blob, a large percentage of the data is actually modified.
  • Another potential pro for this technique is that ALTER TABLE commands are no longer required

Schema-less RDBMS Cons

  • the first serious downside is write amplification. If you are constantly making small updates to one piece of data in a very large blob, the effort MySQL has to go to is greatly increased.
  • this pattern tends to force you to read/write larger amounts of data at once
  • there is a clear loss in functionality. You can no longer easily perform aggregation functions on the data (MIN, MAX, AVG)
  • It can become difficult to apply even the simplest constraints on the data
  • (a smaller issue) is that data will not be stored in the most efficient form


Putting your NoSQL data to work

The fact that you are storing your data into a NoSQL solution, doesn’t mean that you are done with it. You’ll still have to put it to work, transform and move it, or do some data warehousing[1]. And the lack of SQL should not stop you for doing any of these.

One solution available in many NoSQL stores is MapReduce — as an example you can see how you can translate SQL to MongoDB MapReduce.

But MapReduce is not the only option available and I’d like to quickly introduce you to a couple of alternative solutions.


Working with HBase may be at times quite verbose and while Java is not very good at creating DSLs sometimes even a more fluent APIs are useful. This is exactly what HBase-dsl brings you:

However I found myself writing tons of code to perform some fairly simple tasks. So I set out to simply my HBase code and ended up writing a Java HBase DSL. It’s still fairly rough around the edges but it does allow the use of standard Java types and it’s extensible."test"). 
    col("col1", "hello world!");

String value = hBase.fetch("test").


    value("col1", String.class);


HBql goals is to bring, to those missing SQL, a more SQLish interface to HBase. You can take a look at ☞ HBql statements to get a better feeling of what it looks like.


Hive is a data warehouse infrastructure for Hadoop that proposes a SQL-like query language to enable easy data ETL.


Pig is a platform for analyzing large data sets built on Hadoop. I have found a great article ☞ comparing Pig Latin over Hadoop to SQL over a relational database

  1. Pig Latin is procedural, where SQL is declarative.
  2. Pig Latin allows pipeline developers to decide where to checkpoint data in the pipeline.
  3. Pig Latin allows the developer to select specific operator implementations directly rather than relying on the optimizer.
  4. Pig Latin supports splits in the pipeline.
  5. Pig Latin allows developers to insert their own code almost anywhere in the data pipeline.

But don’t think that the HBase and Hadoop are the only one getting such tools. In the graph databases world, there is Gremlin ☞: a graph-based programming language meant to ease graph query, analysis, and manipulation.

I think sooner than later we will see more such solutions appearing in the NoSQL environment.


CouchDB Usecase: skynny_board - a CouchDB based scrum board application

The guys from ☞ have released the code of their scrum board app built using Ruby, Rack, Rails, Sinatra, Sammy, CouchDB and MySQL (nb I’m wondering what CouchDB Ruby library are they using).

So, we have another NoSQL hybrid to play with. Source code available on ☞ GitHub.