debate: All content tagged as debate in NoSQL databases and polyglot persistence
An interesting post on the NOSQL Group about what takes a storage to be considered NoSQL:
- SQL-the-language vs. alternate query languages
- A tabular model for data as opposed to one that is not (e.g. key-value, object, graph, …)
- ACID vs. non-ACID
- Centralized vs. distributed/decentralized
In case we agree with the author, Johannes Ernst, then we might be tempted to conclude as he does:
It’s interesting to observe that any “NoSQL” product could be “NoSQL” in any number of these dimensions. […]
Which would also explain why so many “NoSQL” products are so dissimilar to each other.
So, what makes it NoSQL?
Most of these new NOSQL systems scale without additional effort.
This simply is not true. Many of them only “scale” using consistent hashing in the client (e.g. redis, tokyo?), which means that you’re still responsible for figuring out how to rebalance shards when the time comes. That’s extra effort.
Many of the popular NoSQL dbs don’t partition at all. Couch certainly doesn’t. Mongo’s “auto-sharding” is still in alpha, and I’m not aware of any major deployments of it.
Cassandra can partition data automatically, but as of the current released version, you can’t remove capacity.
NoSQL != automatic scalability.
Lately there seems to be quite a few articles reviving an idea that is not so new anymore: RAM is the new disk and some are connecting this to the NoSQL vs SQL debates.
Jim Gray from Microsoft published the “Tape is Dead. Disk is Tape. Flash is Disk. RAM Locality is King” (see below embedded) in December 2006. There is a nice round up of the opinions on this subject in this InfoQ article: ☞ RAM is the new disk
[M]emory is several orders of magnitude faster than disk for random access to data (even the highest-end disk storage subsystems struggle to reach 1,000 seeks/second). Second, with data-center networks getting faster, it’s not only cheaper to access memory than disk, it’s cheaper to access another computer’s memory through the network. As I write, Sun’s Infiniband product line includes a switch with 9 fully-interconnected non-blocking ports each running at 30Gbit/sec; yow! The Voltaire product pictured above has even more ports; the mind boggles. (If you want the absolute last word on this kind of ultra-high-performance networking, check out Andreas Bechtolsheim’s Stanford lecture.) Tim Bray in ☞ On Grids
Getting back to our days, Nati Shalom of Gigaspace has published an article ☞ Why Existing Databases (RAC) are So Breakable! in which he writes:
Memory can be more reliable then disk
Many people assumes that memory is an unreliable data storage.
That assumption holds true if your data “lives” on a single machine; in this case if the machine fails or crashes your application crashes. But what if you distribute the data across a cluster of nodes and maintain more than one copy of the data over the network? In this case, if a node crashes the data is not gone; it lives elsewhere and can be continuously served from one of its replicas.
The article links to various research papers with real data about disk and RAM reliability:
- ☞ KHB: Real-world disk failure rates: surprises, surprises, and more surprises by Valerie Aurora (Henson)
- ☞ Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? by Bianca Schroeder and Garth A. Gibson
- ☞ Failure Trends in a Large Disk Drive Population (PDF) - Google, Inc., February 2007
- ☞ DRAM Errors in the Wild: A Large-Scale Field Study (PDF)
- ☞ The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM (PDF)
Then there is Ilya Grigorik’s article ☞ Future of RDBMS is RAM Clouds & SSD in which he writes:
However, while the new storage engines are exciting to see, it is also important to recognize that relational databases still have a bright future ahead - RDBMS systems are headed into main memory, which changes the playing field all together. […] Memory is fast, disks are slow. Nothing is stopping relational systems from taking advantage of main memory or SSD storage.
I do think that it is wrong saying that only RDBMS can benefit of the reliability and speed of the RAM. Maybe NoSQL solutions been built nowadays are adapting faster, while long time, massive RDBMS will take a bit longer, but at the end of the day everyone has already agreed that the RAM is the new disk and sooner or later all systems will be rethought to take advantage of this.
Jim Gray: Tape is Dead. Disk is Tape. Flash is Disk. RAM Locality is King