Oracle seems to be in the first lines of this initiative as it looks like ☞ its database becomes more and more aware of systems like Hadoop. The linked article presents two ways in which Oracle can pull data out of HDFS by either accessing it directly through the FUSE driver or by triggering Hadoop to push data into Oracle queues which are further accessible from table functions. A commenter on the post has suggested a 3rd option that sounds even more interesting: using the Oracle Java support for accessing the Hadoop API.
Diagram from Oracle Blogs
While the presented solutions are only about pulling data from Hadoop and processing them in parallel using Oracle parallel processing support, I do think that sooner than later we will see solutions that will use Hadoop for processing data made accessible directly by Oracle.
Here is just a thought on how this would work:
- use some special Oracle functions to pull data from tables and push it into Hadoop accessible queues
- Hadoop (with streaming support) would pull out data from queues and process them internally
- when processing is done, Hadoop can push back data into Oracle accessible queues (as per the above solutions).
Isn’t that an interesting future?
Update: in the light of the newly granted MapReduce patent (Google), I guess it will be a bit more difficult to blame anyone for not incorporating or integrating more closely with Hadoop. What do you think?