NoSQL Hybrid: All content tagged as NoSQL Hybrid in NoSQL databases and polyglot persistence
Tim Lossen’s presentation about the evolution of wooga’s architecture (Facebook games) from using sharded MySQL to a polyglot persistence solution based on master/slave Redis and master/slave MySQL, with pluses and minuses:
Original title and link: wooga’s architecture: Facebook Games on MySQL and Redis (NoSQL databases © myNoSQL)
A while ago, Sid Anand has written a series of posts on challenges of a hybrid solution: Oracle - Amazon SimpleDB. This has become now a paper which offers a much better organized and detailed view on Netflix’s transition to using a hybrid Oracle - Amazon Web Services (SimpleDB, S3) architecture.
Go read the ☞ paper if one of these applies:
- interested in Amazon SimpleDB and SimpleDB best practices
- interested in running an on-premise and cloud hybrid architecture
- interested in architecting a multi data source system
Original title and link: Paper: Netflix’s Transition to High-Availability Storage Systems (NoSQL databases © myNoSQL)
Aside from pointing out to just another NoSQL friendly RDBMS post — these two plus the FriendFeed post were written quite a long time ago, I thought it would be interesting to include here what the guys over MySQL Performance blog consider as good situations for using this technique and its downsides:
Schema-less RDBMS Pros
- If the application really is schema-less and has a lot of optional parameters that do not appear in every record, serializing the data in one column can be a better idea than having many extra columns that are NULL.
- when you update the text/blob, a large percentage of the data is actually modified.
- Another potential pro for this technique is that ALTER TABLE commands are no longer required
Schema-less RDBMS Cons
- the first serious downside is write amplification. If you are constantly making small updates to one piece of data in a very large blob, the effort MySQL has to go to is greatly increased.
- this pattern tends to force you to read/write larger amounts of data at once
- there is a clear loss in functionality. You can no longer easily perform aggregation functions on the data (
- It can become difficult to apply even the simplest constraints on the data
- (a smaller issue) is that data will not be stored in the most efficient form
The fact that you are storing your data into a NoSQL solution, doesn’t mean that you are done with it. You’ll still have to put it to work, transform and move it, or do some data warehousing. And the lack of SQL should not stop you for doing any of these.
But MapReduce is not the only option available and I’d like to quickly introduce you to a couple of alternative solutions.
Working with HBase may be at times quite verbose and while Java is not very good at creating DSLs sometimes even a more fluent APIs are useful. This is exactly what HBase-dsl brings you:
However I found myself writing tons of code to perform some fairly simple tasks. So I set out to simply my HBase code and ended up writing a Java HBase DSL. It’s still fairly rough around the edges but it does allow the use of standard Java types and it’s extensible.
hBase.save("test"). row("abcd"). family("famA"). col("col1", "hello world!"); String value = hBase.fetch("test"). row("abcd"). family("famA"). value("col1", String.class);
HBql goals is to bring, to those missing SQL, a more SQLish interface to HBase. You can take a look at ☞ HBql statements to get a better feeling of what it looks like.
Hive is a data warehouse infrastructure for Hadoop that proposes a SQL-like query language to enable easy data ETL.
Pig is a platform for analyzing large data sets built on Hadoop. I have found a great article ☞ comparing Pig Latin over Hadoop to SQL over a relational database
- Pig Latin is procedural, where SQL is declarative.
- Pig Latin allows pipeline developers to decide where to checkpoint data in the pipeline.
- Pig Latin allows the developer to select specific operator implementations directly rather than relying on the optimizer.
- Pig Latin supports splits in the pipeline.
- Pig Latin allows developers to insert their own code almost anywhere in the data pipeline.
But don’t think that the HBase and Hadoop are the only one getting such tools. In the graph databases world, there is Gremlin ☞: a graph-based programming language meant to ease graph query, analysis, and manipulation.
I think sooner than later we will see more such solutions appearing in the NoSQL environment.