NoSQL hybrid: All content tagged as NoSQL hybrid in NoSQL databases and polyglot persistence
I have covered before some hybrid solutions, most of these involving “tweaked” traditional databases to get rid of unnecessary constraints, so this is so far the only NoSQL hybrid solution I’ve read about involving a NoSQL storage and an RDBMS. Sid Anand (@r39132), Netflix cloud engineer, has a series of articles covering the challenges the team down there faced while working on this Oracle/SimpleDB hybrid NoSQL solution:
The challenges can be summarized in several parts:
- Pulling data out of Oracle Efficiently
- Solving the Oracle-SimpleDB Eventual Consistency Problem
- Defining the SimpleDB-Oracle translation
After reading the articles I still have some unanswered questions:
the first phase of data migration is still unclear.
My understanding is that there is a secondary process going over the existing records and “updating” them so that triggers are activated.
how does the SimplyDB to Oracle synchronization work?
the part 3 covering the feature mismatch between SimpleDB and Oracle is not covering all presented aspects:
- Stored Procedures
- Constraints (e.g. integrity, foreign key, unique, etc…)
- Sequences used as Primary Keys
- Tables without Primary Keys or Unique Keys or both
- Relationships between tables
The part I have found the most interesting was the one about the “simple” algorithm used for ensuring eventual consistency. And in the same piece, something to note:
Without the anticipated Amazon API, we cannot build an eventually-consistent Hybrid system optimized for AP (i.e. from CAP theorem). We would have had to rely on dual-writes, defeating our goal to be highly-available.
-  ☞ Introducing the Oracle-SimpleDB Hybrid
-  ☞ Part1: Pulling data out of Oracle Efficiently
-  ☞ Part2: Solving the eventual consistency problem
-  ☞ Part 3: Defining the SimpleDB-Oracle Translation
-  The Beginning of an Interesting Friendship: MapReduce and RDBMS
-  Drizzle Replication: Opening the Doors to Hybrid Solutions
-  Bringing NoSQL to the people: Now Django
In the spirit of “reconciliation” for this end of NoSQL vs SQL year, I thought it would be interesting to mention that I read that the list of traditional (and not only) RDBMS adding support for in-database MapReduce is getting longer by the day. Here is what I could gather so far (nb: please note that not all of the mentioned systems are RDBMS and that some of the linked articles are PR announcements, so they can contain inaccurate data)
- Sybase IQ: ☞ here, ☞ here and ☞ here
- IBM M2: ☞ IBM’s M2 corrals massive data sets with Hadoop
- Oracle: ☞ here
- Aster data: ☞ here
- Greenplum: ☞ here
- Vertica: ☞ here
- Teradata and *Netezza: ☞ here
I assume that is what Jeff Davis meant when writing in his post ☞ NoSQL can be fast, but what if SQL were fast and flexible?:
A unified database management system that integrates NoSQL processing models with a traditional SQL system is the real answer here; and streaming is one way to accomplish that. This integration allows a wide range of data processing strategies to work together – traditional tables offer recovery of streaming data, for instance — rather than forcing you to choose a single processing model.
In other words, the language and logical model should be separate from the processing model. And isn’t that what the relational model is all about?
While some may say that MapReduce adoption is just another validation of the NoSQL movement, I confess I find it quite a natural reaction. My concern is that currently MapReduce integration comes in too various forms, as Daniel Abadi remarks too:
Teradata, Microsoft, Sybase, and, to an extent, Netezza, all seem to believe that providing a library of preoptimized functions distributed with the software is the way to go.
The other school of thought is adopted by vendors that allow customers more freedom to implement their own functions, but constrain the language in which this code is written (such as MapReduce or LINQ) to facilitate the automatic parallelization of this code inside the DBMS.
And until we will see some consolidation in this space, hybrid NoSQL solutions may remain the way to go.
This is probably the last post for 2009, so I’d like to use this opportunity to thank all MyNoSQL readers for their contributions and to wish you all a great 2010!
I’d also like to share with you my wish to make MyNoSQL the place to read and learn about NoSQL in 2010 and my hope that you all will be with me in this endeavor.
Django is one of the most popular Python frameworks, the one that Google picked to integrate with their Google App Engine PaaS. Thanks to a GSOC project, Django has added now ☞ support for multiple databases and that includes NoSQL stores:
Multiple TYPES of databases. This is the one I’m most excited about. This is going to enable people to use some of the NoSQL databases […]
The multi-database support is right now only in the development trunk — documentation can be found ☞ here — so it might take a while until a Django release will include it and I’m not sure this feature will be backported. But this is definitely just another validation for the NoSQL world.
As far as I know, even before this announcement, there were some efforts to integrate NoSQL solutions with Django and the one I know about is ☞ Neo4j for Django:
The way that the integration between Django and Neo4j is implemented is in the Model layer. Since Neo4j does not have a SQL engine it would not have been efficient or practical to implement the support as a database layer for Django. Google did their implementation in the same way when they integrated BigTable with Django for App Engine. This means that there will be some minor modifications needed in your code compared to using PostgreSQL or MySQL. Just as with BigTable on App Engine, you will have to use a special library for defining your models when working with Neo4j, but the model definition is very similar to Djangos built in ORM.
Now, you might wonder why I do believe that getting NoSQL support in (popular) frameworks is an important step. For the last year or so, the NoSQL stores have been under scrutiny for their technical value and I’d say that, by now, this phase is almost over. Next came the business validation and there are good signs that NoSQL world sees some traction there too.
So, what is left? The simple answer is bringing NoSQL to the people. And by this I mean making it easy to adopt one or the other NoSQL solution and having seamless and standardized integration with existing frameworks and tools. Making it easy to switch from classical RDBMS based solutions to NoSQL solutions or even hybrid SQL-NoSQL solutions is both important and critical for adoption. Ease of adoption will bring us a lot of new use cases and that will make further adoption even easier. And that is the guarantee for a bright NoSQL future.
Update: Even if there were ways to use NoSQL solutions with Django (take for example this intro to using CouchDB with Django), I think the new integration layer will make things feel more natural. I read that here is already a CouchDB Django database adapter and also a demo of using MongoDB with Django. And I bet more will come, so keep an eye on our list of NoSQL libraries.
I don’t know how many have heard of or used Drizzle , the MySQL engine optimized for cloud and net applications, but there seems to be some activity (from Marcus Eriksson) around creating Drizzle replication to different NoSQL stores: Project Voldemort and memcached  or Cassandra .
Leaving aside the technical details — which are definitely interesting , the solution using the Erlang AMQP  implementation RabbitMQ  — I think this replication layer could represent a good basis for SQL-NoSQL hybrid solutions, which is a direction we’ve mentioned before: Introducing the Oracle-SimpleDB Hybrid
It would be interesting to hear other stories from those that are investigating the NoSQL hybrid solutions.
And while we are at MySQL engines, I thought I should also mention this question from Ilya Grigorik (@igrigorik):
anyone try or using TokuDB ? drop in MySQL engine using fractral trees, claims 10-50x over InnoDB, etc.