NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



nosql databases: All content tagged as nosql databases in NoSQL databases and polyglot persistence

Why I Love and Hate NoSQL and RDBMS Databases

The most sincere, simple, and correct list of pros and cons about NoSQL databases and RDBMS. Hat tip to Kelly Martinez.

Original title and link: Why I Love and Hate NoSQL and RDBMS Databases (NoSQL database©myNoSQL)


Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB

The interview Dmitry Sotnikov1 had with Monty Widenius was published on so many places that I had a hard time deciding which to link to. Anyways, there are a couple of comments and corrections that I’d like to suggest:

The whole thing with the “new NoSQL movement” started with a blog post from a Twitter employee that said MySQL was not good enough and they needed “something better,” like Cassandra.

That’s not quite correct. The “NoSQL movement” debuted in 2009 when the guys from organized an event about “open source, distributed, non relational databases” where they invited people from companies like Cloudera, LinkedIn, StumbleUpon, etc. to talk about the solutions they were building to responde to their platforms’ special requirements. But as papers like Bigtable: A distributed storage system for structured data and Dynamo: Amazon’s Highly Available Key-value Store prove, NoSQL solutions have been in production way before 2009.

I can’t find the original article, but I did find a follow up a bit later where it was said MySQL would be dropped for Cassandra.

I can help find that article as it was posted on this blog: Cassandra @ Twitter: An Interview with Ryan King

The main reason Twitter had problems with MySQL back then, was that they were using it incorrectly.

I don’t think there are many examples in the history of software where a private platform benefited from more scaling advice than Twitter. Judging by how many solutions have been suggested, a possible Twitter IPO will be at risk of IP law suites.

The current state is that now, three years later, Twitter is still using MySQL as their main storage for tweets. Cassandra was, in the end, not able to replace MySQL.

That’s true. What’s also true is that at that time Cassandra was at version 0.9 and and that having to invest into a new databases was considered riskier than investing into more hardware and hiring MySQL experts.

The main reason NoSQL became popular is that, in contrast to SQL, you can start using it without having to design anything. This makes it easier to start with NoSQL, but you pay for this later when you find that you don’t have control of your data (if you are not very careful).

I assume that this is how a vendor would present flexible data models as a drawback. It is also one of the most dangerous misconceptions about NoSQL, i.e. NoSQL databases require no data modeling. The reality is that most of the time using a NoSQL database will require a lot more thinking and analysis of the data models and data access patterns. There are no blueprints, no normalized forms, and no ORMs to hide everything away.

As soon as data can’t fit into memory, SQL generally outperforms NoSQL.

Where’s the proof? According to the data I have, there’s no comparison between let’s say Cassandra and MySQL.

For anything else, you have to write a program and it’s very hard to beat a SQL optimizer for complex things, especially things that are automatically generated based on user requests (required for most web sites).

That’s true. Except when:

  1. most of the people don’t know how to write those SQL queries—search StackOverflow for a random sample of what I mean
  2. getting everything out of your database requires using vendor specific solutions
  3. there’re those moments when the optimizer decides to change the execution plan in such a way that brings down your whole service

The problem with Hadoop is that there is no known business model around it that ensures that the investors will get back 10X money that they expect. Because of that, I have a hard time understanding how Cloudera can survive in the long run.


Everything else in the interview is spot on.

  1. Dmitry Sotnikov: COO at Jelastic 

Original title and link: Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB (NoSQL database©myNoSQL)


The Future of NoSQL Databases: Hybrid Tools for OLTP and OLAP

John L. Myers has an interesting hypothesis for the future of NoSQL databases based on their capability of handling “unstructured” data:

I think the future of NoSQL platforms is going to reside in the ability of those systems to apply different operational or analytical schemas to multi-structured data sets rather than letting the data reside in a schema-free format. Merely storing multi- structured data sets will not be enough to have a NoSQL platform meet business objectives. The true business value will be in the ability to apply the structures of a particular schema for analysis or for operational workloads in real-time or near real-time.

What Myers suggests here is that storing unstructured data allows an application to define different “schemas” to repurpose the way data is used. In theory this sounds quite interesting. If done dynamically, this could define a system that could provide both OLTP and OLAP features.

The structure of the data has a very important influence on the data access implementation and the simple addition of structure metadata would not lead to the system to continue to perform optimally in various scenarios or for different workloads. Put it differently, OLTP and OLAP systems require data to be organized (and stored) differently in order to handle the different access patterns and different workloads. Switching from one to another while maintaining the characteristics of the system (reliability, performance, stability, etc.) seems to lead to a level of complexity that would be very difficult for a single system to handle.

Original title and link: The Future of NoSQL Databases: Hybrid Tools for OLTP and OLAP (NoSQL database©myNoSQL)


System Level and Functional Requirements for the Backend Database of a User Engagement Platform

Very good and practical analysis of what the requriments of a user engagement platform are for the backend database from both the system level and functional point of views. The ideal case is also spelled out, but I don’t think there’s one product out there that could do all of these:

So, today’s and tomorrow’s engagement services should accommodate, heavy write loads, heavy read loads, heavy aggregate(counter), modify and read loads. What becomes apparent if we look at user engagement services in this way is that aggregation needs to be a first class function of engagement services that is near real time, scalable and highly available.

Original title and link: System Level and Functional Requirements for the Backend Database of a User Engagement Platform (NoSQL database©myNoSQL)


A CEO (Confusing) Perspective on Hadoop, Big Data, and NoSQL Databases Market

This article authored by the CEO of MapR left me scratching my head and wondering if I’m watching the same market:

Hadoop has distanced itself from MongoDB, Cassandra, Couchbase, and the plethora of NoSQL options to become the safe choice.

In what scenarios does Hadoop compete directly with Cassandra, Couchbase, or MongoDB?

Support for Cassandra has dropped off, with Facebook reducing it’s investment in the technology and the realization that an eventual consistency model is appropriate for only a limited set of use cases. MongoDB’s growth has flattened despite having a friendly programming environment due to lack of scalability.

What real market data supports the above statements?

One application that is particularly well suited for HBase is BLOB stores, which require large databases with rapid retrieval. BLOBS — binary large objects — are typically images, audio clips or other multimedia objects, and storing BLOBs in a database enables a variety of innovative applications.

Is HBase as a BLOB store really a common scenario? What makes HBase a better solution for BLOB storage than say distributed file systems?

Hadoop will be used more in real-time and lightweight OLTP applications

What definitions of real-time, lightweight, and OLTP are used to make the above statement even remotely correct?

Someone volunteering to help me understand?

Original title and link: A CEO (Confusing) Perspective on Hadoop, Big Data, and NoSQL Databases Market (NoSQL database©myNoSQL)


2013 Predictions for NoSQL Databases

In case you feel you didn’t see enough predictions for 2013 about NoSQL databases, John L.Myers has a couple. Unfortunately they’re all (very) vague.

Here is my list of predictions for the NoSQL “industry” for 2013 in no particular or alphabetical order … But rather inspired by the soundtrack of my college days that incidentally coincides with the rise of the last “great” data management paradigm – SQL in the 1980s.

Original title and link: 2013 Predictions for NoSQL Databases (NoSQL database©myNoSQL)


Why Use a NoSQL Database, and Why Not?

Very nice little research done by Adam Fowler:

I just conducted a review of the first 70 results from Google on the question “Why use a NoSQL database?”. In this post I show you the results in the for camp, and the against camp.

The results are exactly what you’d expect, but you’ll need to check Fowler’s post to compare the final list with yours.

Original title and link: Why Use a NoSQL Database, and Why Not? (NoSQL database©myNoSQL)


What Is the Spring Data Project?

Short answer: another sign that the Spring framework wants to do everything everywhere. A mammoth1.

Version 1.0 was released in 2004 as a lightweight alternative to Enterprise Java Beans (EJB). Since, then Spring has expanded into many other areas of enterprise development, such as enterprise integration (Spring Integration), batch processing (Spring Batch), web development (Spring MVC, Spring Webflow), security (Spring Security). Spring continues to push the envelope for mobile applications (Spring Mobile), social media (Spring Social), rich web applications (Spring MVC, s2js Javascript libraries), and NoSQL data access(Spring Data).


The complete pipeline can be implemented using Spring for Apache Hadoop along with Spring Integration and Spring Batch. However, Hadoop has its own set of challenges which the Spring for Apache Hadoop project is designed to address. Like all Spring projects, it leverages the Spring Framework to provide a consistent structure and simplify writing Hadoop applications. For example, Hadoop applications rely heavily on command shell tools. So applications end up being a hodge-podge of Perl, Python, Ruby, and bash scripts. Spring for Apache Hadoop, provides a dedicated XML namespace for configuring Hadoop jobs with embedded scripting features and support for Hive and Pig.

  1. There’s a business reason for doing this though: when you have tons of clients you want to make sure they don’t have a chance to step outside. Is this new year resolution a heresy : I plan to use vastly less Spring this year

Original title and link: What Is the Spring Data Project? (NoSQL database©myNoSQL)


Three Analyst Predictions for 2013: Hadoop, SAP, and MySQL vs NoSQL

The season of predictions is here. Chris Kanaracus in an all-bold post, quoting analysts:

Jon Reed: “Expect SAP to purchase an up-and-coming “big data” product or vendor, and perhaps several, including at least one that specializes in integration with the Hadoop framework for large-scale data processing”.

I’m still scratching my head to come up with the long list of product or vendors specialized in integration of Hadoop that SAP could acquire.

Curt Monash: “Expect plenty of additional adoption for Hadoop. Everybody has the ‘big bit bucket’ use case, largely because of machine-generated data. Even today’s technology is plenty good enough for that purpose, and hence justifies initial Hadoop adoption.”


What I hope to see happening is that besides the companies putting together the building blocks to make Hadoop friendly enough (real work) and the companies claiming integration with Hadoop (not that fantastic work), there’ll be some companies that take the Hadoop stack and built tools whose immediate impact on the business can be measured. Basically vertical solutions applying the Hadoop stack to specific markets, segments, and scenarios.

The main challenge of “Big Data” these days is not that there isn’t value behind it. It’s the measurability of this value. What each company looking into Big Data tries to answer is what value does big data carry for my case? This is a founded question as not every company has an infinite budget, time, and magic resource pool.

Curt Monash: “Usually when the topic of alternative databases comes up, the incumbent is often Oracle or IBM DB2. But in 2013, MySQL could be playing the latter role. NoSQL and NewSQL products often are developed as MySQL alternatives.

Until now NoSQL companies have understood that the competition is not with each. The huge market that relational databases have it covered has enough potential to welcome a few solid NoSQL solutions and there’s no long term need to fight over the few people that already paid attention to them.

Make your bets.

Original title and link: Three Analyst Predictions for 2013: Hadoop, SAP, and MySQL vs NoSQL (NoSQL database©myNoSQL)

NoSQL Is for Moms-N-Pops Websites

Nikita Ivanov (GridGrain):

Look, 90% of NoSQL usage comes from the same crowd as a typical memcached users: non-critical, “moms-n-pops” websites. 90% of IMDG/IMCG usage comes from mission critical systems.

Different customers => different requirements => different products

Unfortunately neither my mom nor my dad are behind non-critical websites like Amazon, Facebook, Google, or LinkedIn. What about yours?

Original title and link: NoSQL Is for Moms-N-Pops Websites (NoSQL database©myNoSQL)


Migrating Between Two Different Types of NoSQL Databases

Teacher asking a student:

After the Presentation the team leader asked me how it is, to migrate the db’s under various types. […] Can I migrate from a, key-value store db, like dynamo, to a, document store db, like mongoDB?

I’m not sure if this would have been reflected on the final grade, but I would have asked how many times did the teacher had to, really had to migrate data between multiple relational databases? And how many times it worked automatically? If allowed I’d have followed up with a very brief dialogue about the complexity of migrating applications to different programming languages (even when they use the same programming paradigm) and brought up examples of important differences of access and mutations of data structures. In the end I might have failed the exam though.

On a more serious note, there are so many aspects of migrating data that is very difficult to have a good short answer to this question. A sign of this problem’s complexity is the wide range of companies and products trying to solve ETL.

Original title and link: Migrating Between Two Different Types of NoSQL Databases (NoSQL database©myNoSQL)


A Short History of NoSQL, SQL, NoSQL

This post was written by Konstantin Osipov, one of the authors of Tarantool key-value store, and has been posted on the NoSQL Google group.

Way back in the 1960s databases didn’t separate data representation and data access.

To navigate in an index, a database user had to know the physical structure of the index.

Obvious deficiencies of the approach led to introduction of separation of data model and data representation. Relational model is one and still the most popular way to do it.

One of the most well known deficiencies of a relational model is the so-called object-relational impedance mismatch: there is more than one way to map objects to relations, and none of them fits all access patterns well.

It has as well a number of advantages: simplicity, ease of analytical processing, and, let’s not forget, performance: by normalizing data, a user is forced to tell the DBMS more about data constraints, distribution, future access patterns.

This makes building efficient and to-the-point data representation structures easier.

Unfortunately, the past generations of database management systems did not address one of the main architecture drawbacks, which plagues the relational model: rigidity of schema change. Very few mainstream DBMS allow to change the structure of a relational database quickly, without downtime or significant performance penalty. This is not a drawback of the relational model, but of one which relates to the implementation.

It should also be kept in mind that in many cases a relational model is an overkill, and a simple key to value mapping is sufficient.

And of course no single model can fit all needs (e.g. graph databases build around the notion of nodes & edges, yet, good luck trying to quickly calculate CUBE on a bunch of nodes in a graph database).

Unfortunately, the world of NoSQL, when it comes to the data model, often simply takes us back to the 60s: there is minimal abstraction of data access from data representation, and once a certain representation has been chosen, there is no way to change it without rewriting your application (e.g. to fit the new performance profile).

Scalability is an answer, but a silly one: throwing more hardware at a problem is not always economical.

Original title and link: A Short History of NoSQL, SQL, NoSQL (NoSQL database©myNoSQL)