performance: All content tagged as performance in NoSQL databases and polyglot persistence
Tuesday, 15 December 2009
Thoughts on NoSQL vs SQL Articles
There have been a couple of articles lately about NoSQL vs SQL that seemed to have caught a lot of attention. I finally had the time to go through them and jot down some of my thoughts.
Michael Driscoll in ☞ sql in dead. long live sql! identifies three aspects of the NoSQL environment:
- A dislike for SQL’s syntax, which is ill-fitted to programming patterns.
- A rejection of the strong typing of relational schemas
- A critique of performance, which in turn relates to how concurrency and partitioning of computation is handled
These are quite similar to the NoSQL-ness criteria I wrote about.
Now, I don’t think there is anything in the NoSQL world against the SQL as a language, but rather by transitivity with the systems behind it. The software engineering world have longly discussed about the object-relational paradigm mismatch and it came up with a set of different patterns to overcome it (active record, ORM, etc.).
Michael builds his pro SQL argument based on the following arguments, with which I do agree:
But SQL lives on for a deeper reason: it is a simple yet powerful language for set operations. SQL captures the essential patterns of data manipulation, such as:
- intersections (JOINs)
- filters (WHEREs)
- reductions or aggregations (GROUP BYs)
Considering that most NoSQL systems are moving the “intersection” operation at a different level (either at the storage level by denormalization or at programmatic level), the two operation left are “filtering” and “reductions”, which sound extremely close to MapReduce basic principles. The interesting fact is that MapReduce was designed to allow parallelization while SQL was not (it is also known that imperative code is more difficult to parallilize). And I am not aware of any RDBMS that has implemented parallel — in the sense of distributing the execution — queries.
So leaving this aside, I tend to agree with his conclusion (and I think that solutions like Yahoo! PIG, Facebook HIVE are showing that people might still prefer simpler than MapReduce solutions):
I can’t imagine the programmer pain and suffering that went into building one, unified, global database. But once it’s there, I’d much prefer to access it with SQL statements than MapReduce code .
On the other hand, I tend to disagree with the points Curt Monash is making in his article ☞ The legit part of the NoSQL idea:
Relational database management systems were invented to let you use one set of data in multiple ways
[…]
RDBMS are more mature than most competing technologies
Unfortunately ☞ Ben Scofield’s NoSQL Misconceptions article doesn’t cover any of these, so I’ll try to address them myself.
Firstly, I think it is a mistake to consider that the maturity of a technology transforms it in the right tool for the right job. While I do agree that “not all of us are Google” (as Justin Sheehy of Riak said it) and I do hate the “not invented here syndrom”, I do think that as an industry we should always try to use and provide the best tools for the right job.
Secondly, I disagree with the fact that it is easy to use relational databases for getting multiple perspectives on the same set of data. And I think datawarehouses and BI tools through their existence are proving my point. They are expensive and difficult to maintain and use. And as in Michael’s quote above: ” I can’t imagine the programmer pain and suffering that went into building one, unified, global database”.
Last, but not least, going back to Ben’s article, I completely disagree with his “I can do NoSQL just as well in a relational database” argument. I have written about this approach before in the post ☞ A Schema-less relational database and I do think that there are scenarios that can benefit of such a solution.
Resources
- ☞ sql is dead. long live sql! (Michael E. Driscoll)
- ☞ The legit part of the NoSQL idea (Curt Monash)
- ☞ NoSQL Misconceptions (Ben Scofield)
- What Makes It NoSQL?
- The “NoSQL” dispute: A performance argument
- Notes on Scaling out with Riak and Riak Search Podcast
Monday, 14 December 2009
Memcached-in-the-Cloud by Gear6
Memcached is used as a reference in the NoSQL world for its API and also for performance comparisons. Some NoSQL KV stores are offering a Memcached compatible API and some are even supporting the same protocol.
Startup Gear6 today launched the availability of its memcached appliance on Amazon’s Web Services platform, bringing a widely used distributed memory caching system for web companies to the cloud.
What seems to be missing from the announcement is any mentions of automatic Memcached scaling. Wouldn’t that be an interesting feature?
via: http://gigaom.com/2009/12/08/gear6-brings-memcached-to-amazons-cloud/
The “NoSQL” dispute: A performance argument
In summary, blinding performance depends on removing overhead. Such overhead has nothing to do with SQL, but instead revolves around traditional implementations of ACID transactions, multi-threading, and disk management. To go wildly faster, one must remove all four sources of overhead, discussed above. This is possible in either a SQL context or some other context. …
But as far as I know there is no easy way to tweak any of these “features” of existing RDBMS.
via: http://sillybits.wordpress.com/2009/12/10/the-nosql-dispute-a-performance-argument/
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling