NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Oracle Database or Hadoop? And What Led to NoSQL Databases

In a follow up post to SQL or Hadoop: What Tools Should I Use to Process My Data?, Gwen Shapira presents some reasons why, even if many things that fit into Hadoop better, could be done with Oracle, that’s not also a good idea:

But, do you really want to use Oracle to store millions of emails and scanned documents?[1] I have few customers who do it, and I think it causes more problems than it solves. After you stored them, do you really want to use your network and storage bandwidth so  the application servers will keep reading the data from the database? Big data is… big. It is best not to move it around too much and run the processing on the servers that store the data. After all, the code takes fewer packets than the data. But, Oracle makes cores very expensive.  Are you sure you want to use them to run processing-intensive data mining algorithms?

Then there’s the issue of actually programming the processing code. If your big data is in Oracle and you want to process it efficiently, PL/SQL is pretty much the only option. […]

All these are very solid arguments.

Generalizing a bit the point Gwen’s making, I would say that this is exactly the history and what made relational databases successful. Providing decent solutions, up to a point, to a wide range of problems and covering more scenarios than alternative storage solutions existing at that time, made relational databases the de facto storage for the last 30 years[2]. But during the last years, more and more problems crossed the boundaries of what could have been considered decent solutions leading to the need for specialized, better than good enough alternative solutions. And thus NoSQL databases.

  1. Interestingly, when presented with a Hadoop and Solr solution for archiving emails, I’ve also wondered if that is the best solution.  

  2. This is a bit of an oversimplification to make the point, as there were other obvious technical advantages of relational databases over some of the alternative solutions.  

Original title and link: Oracle Database or Hadoop? And What Led to NoSQL Databases (NoSQL database©myNoSQL)