NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Hadoop and Oracle Parallel Processing

At the end of last year, in a “reconciliation” attempt, I was writing that even if not all, more RDBMS are looking into integrating MapReduce in their set of tools.

Oracle seems to be in the first lines of this initiative as it looks like ☞ its database becomes more and more aware of systems like Hadoop. The linked article presents two ways in which Oracle can pull data out of HDFS by either accessing it directly through the FUSE driver or by triggering Hadoop to push data into Oracle queues which are further accessible from table functions. A commenter on the post has suggested a 3rd option that sounds even more interesting: using the Oracle Java support for accessing the Hadoop API.

Diagram from Oracle Blogs

While the presented solutions are only about pulling data from Hadoop and processing them in parallel using Oracle parallel processing support, I do think that sooner than later we will see solutions that will use Hadoop for processing data made accessible directly by Oracle.

Here is just a thought on how this would work:

  • use some special Oracle functions to pull data from tables and push it into Hadoop accessible queues
  • Hadoop (with streaming support) would pull out data from queues and process them internally
  • when processing is done, Hadoop can push back data into Oracle accessible queues (as per the above solutions).

Isn’t that an interesting future?

Update: in the light of the newly granted MapReduce patent (Google), I guess it will be a bit more difficult to blame anyone for not incorporating or integrating more closely with Hadoop. What do you think?