Hadoop: All content tagged as Hadoop in NoSQL databases and polyglot persistence
I’m catching up with the news these days and this rumor about Hortonworks from Curt Monash’s post sounds pretty big:
There’s a widespread belief that Hortonworks is being shopped. Numerous folks — including me — believe the rumor of an Intel offer for $700 million. Higher figures and alternate buyers aren’t as widely believed.
First of all, I don’t know anything about this—and just to be clear that means I really don’t know anything. But if it turns out to be true:
- it’s huge news for the Hadoop market
- it’s big news for the open source world as I think it would represent the 2nd largest acquisition of a pure open source company after MySQL. Achieved in 5th of the time
- this could make things simpler or much more complicated for Cloudera. Depending on how the acquirer will decide to operate the business
- this could be good news or pretty bad news for the Hadoop community and ecosystem considering the contributions Hortonworks made over time
If someone decides to drop me an “anonymous” email I promise I won’t hear anything.
Original title and link: Rumors about a Hortonworks Acquisition ( ©myNoSQL)
Two links for those interested in seeing how an automation API for Hadoop would look like:
At the first glance both of the APIs support the same range of resources/end points.
Cloudera Manager comes in two editions: free and enterprise with some of the automation features (service monitoring & management, security), being available only in the latter one. I’m not sure if all the endpoints are available through the free edition of the Cloudera Manager.
Original title and link: Hadoop Cluster Automation APIs: Ambari and Cloudera Manager ( ©myNoSQL)
Simply put, Hadoop becomes the staging area for “raw data streams” while the EDW stores data from “operational systems”. Hadoop then analyzes the raw data and shares the results with the EDW. […] The paper then positions Hadoop as an active archive. I like this idea very much. Hadoop can store archived data that is only accessed once a month or once a quarter or less often.. and that data can be processed directly by Hadoop programs or shared with the EDW data using facilities such as Teradata’s SQL-H, or Greenplum’s External Hadoop tables (not by HAWQ, though… see here), or by other federation engines connected to HANA, SQL Server, Oracle, etc.
It’s an interesting positioning of Hadoop. And it’s very similar to the approach Linux has taken when penetrating the walls of enterprises. Then it slowly replaced pretty much everything.
In the early days—we are still in those days, the EDW vendors could still believe this story: Hadoop is complicated and meant for batch processing and it lacks the tools and refinements built over years in EDW.
But the story is starting to change. Fast. Hadoop is becoming more of a platform (YARN), it gets support for (almost) real-time querying (Impala, Project Stinger, HAWQ, just to name a few), and Hadoop leaders are signing partnerships with challengers and incumbents of the big data market at a rate that I don’t think I’ve seen before.
In the end, guess who will become the pillar of the big data platforms: the solution storing all the data or those tools being able to process, indeed very fast and with much control, limited amounts of that data?
✚ The Cloudera-Teradata paper titled “Hadoop and the Data Warehouse: When to Use Which” can be found here.
Original title and link: Hadoop and the EDW ( ©myNoSQL)