performance: All content tagged as performance in NoSQL databases and polyglot persistence
There have been a couple of articles lately about NoSQL vs SQL that seemed to have caught a lot of attention. I finally had the time to go through them and jot down some of my thoughts.
Michael Driscoll in ☞ sql in dead. long live sql! identifies three aspects of the NoSQL environment:
- A dislike for SQL’s syntax, which is ill-fitted to programming patterns.
- A rejection of the strong typing of relational schemas
- A critique of performance, which in turn relates to how concurrency and partitioning of computation is handled
These are quite similar to the NoSQL-ness criteria I wrote about.
Now, I don’t think there is anything in the NoSQL world against the SQL as a language, but rather by transitivity with the systems behind it. The software engineering world have longly discussed about the object-relational paradigm mismatch and it came up with a set of different patterns to overcome it (active record, ORM, etc.).
Michael builds his pro SQL argument based on the following arguments, with which I do agree:
But SQL lives on for a deeper reason: it is a simple yet powerful language for set operations. SQL captures the essential patterns of data manipulation, such as:
- intersections (JOINs)
- filters (WHEREs)
- reductions or aggregations (GROUP BYs)
Considering that most NoSQL systems are moving the “intersection” operation at a different level (either at the storage level by denormalization or at programmatic level), the two operation left are “filtering” and “reductions”, which sound extremely close to MapReduce basic principles. The interesting fact is that MapReduce was designed to allow parallelization while SQL was not (it is also known that imperative code is more difficult to parallilize). And I am not aware of any RDBMS that has implemented parallel — in the sense of distributing the execution — queries.
So leaving this aside, I tend to agree with his conclusion (and I think that solutions like Yahoo! PIG, Facebook HIVE are showing that people might still prefer simpler than MapReduce solutions):
I can’t imagine the programmer pain and suffering that went into building one, unified, global database. But once it’s there, I’d much prefer to access it with SQL statements than MapReduce code .
On the other hand, I tend to disagree with the points Curt Monash is making in his article ☞ The legit part of the NoSQL idea:
Relational database management systems were invented to let you use one set of data in multiple ways
RDBMS are more mature than most competing technologies
Unfortunately ☞ Ben Scofield’s NoSQL Misconceptions article doesn’t cover any of these, so I’ll try to address them myself.
Firstly, I think it is a mistake to consider that the maturity of a technology transforms it in the right tool for the right job. While I do agree that “not all of us are Google” (as Justin Sheehy of Riak said it) and I do hate the “not invented here syndrom”, I do think that as an industry we should always try to use and provide the best tools for the right job.
Secondly, I disagree with the fact that it is easy to use relational databases for getting multiple perspectives on the same set of data. And I think datawarehouses and BI tools through their existence are proving my point. They are expensive and difficult to maintain and use. And as in Michael’s quote above: ” I can’t imagine the programmer pain and suffering that went into building one, unified, global database”.
Last, but not least, going back to Ben’s article, I completely disagree with his “I can do NoSQL just as well in a relational database” argument. I have written about this approach before in the post ☞ A Schema-less relational database and I do think that there are scenarios that can benefit of such a solution.