Hortonworks: All content tagged as Hortonworks in NoSQL databases and polyglot persistence
On the other hand, to address the question in the title—would custom distributions clarify Hadoop versions—I think that while custom distributions might be helpful for experimenting or getting started with Hadoop, long term they’ll actually lead to more segmentation in the market and bigger maintenance and upgrade costs for end users.
There are just a few companies with a track record of maintaining and distributing open source projects—in the Hadoop space these are Cloudera and Hortonworks (nb Hortonworks is supporting the Apache Hadoop distribution). So if a vendor tries to sell you a Hadoop package ask them about their history managing open source distributions.
Original title and link: Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions? ( ©myNoSQL)
Just a quick roundup of the latest releases and announcements.
Hortonworks Data Platform (HDP) version 2
HDP v2 will include:
- NextGen MapReduce architecture
- HDFS NameNode HA
- HDFS Federation
- up-to-date HCatalog, HBase, Hive, Pig
According to the announcement:
In order to avoid confusion, let me explain the two versions of HDP:
- HDP v1 is based upon Apache Hadoop 1.0 (which comes from the 0.20.205 branch). It the most stable, production-ready version of Hadoop that is currently found in many large enterprise deployments. HDP v1 is currently available as a private technology preview. A public technology preview will be made available later this quarter.
- HDP v2 is based upon Apache Hadoop 0.23, which includes the next generation advancements mentioned above. It’s an important step forward in terms of scalability, performance, high availability and data integrity. A technology preview will also be made publicly available later in Q1.
SolrCloud Completes Phase 2
Mark Miller about the completion of phase 2:
The second phase of SolrCloud has been in full swing for a couple of months now and it looks like we are going to be able to commit this work to trunk very soon! In Phase1 we built on top of Solr’s distributed search capabilities and added cluster state, central config, and built-in read side fault tolerance. Phase 2 is even more ambitious and focuses on the write side. We are talking full-blown fault tolerance for reads and writes, near real-time support, real-time GET, true single node durability, optimistic locking, cluster elasticity, improvements to the Phase 1 features, and more.
Not there yet, but it’s coming.
DataStax Community Server 1.0.7
A new release of DataStax’s distribution of Cassandra incorporating Cassandra 1.0.7
Don’t let the version number trick you. This is an important release for HBase featuring:
- new (self-migrating) file format
- AWS improvements: EBS support, building a HA cluster
I’m leaving you with Andrew Purtell’s slides about HBase Coprocessors:
Just a quick recap:
- Cloudera: Oracle, Dell, NetApp
- Hortonworks: Microsoft
- MapR: EMC (integration with Greenplum HD)
Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.
Original title and link: Partnerships in the Hadoop Market ( ©myNoSQL)
There’s a series of events lately that makes me think Microsoft is nowhere near accepting defeat in the cloud services area. As regards Microsoft’s Project Isotop, things are much simpler than ZDNet article make them sound: Microsoft is working on integrating Hadoop and its toolchain with their own products (SQL Server Analysis Services, PowerPivot).
A picture worth more than the 626 words.
I bet the details of integration are fascinating and far from being simple, but the article is not focusing on those ↩
Original title and link: Project Isotope Will Bring Together Hadoop Toolchain With Microsoft’s Data Products ( ©myNoSQL)
Filtering and augmenting a Q&A on Quora:
- Cloudera: Hadoop distribution, Cloudera Enterprise, Services, Training
- Hortonworks: Apache Hadoop major contributions, Services, Training
- MapR: Hadoop distribution, Services, Training
- HPCC Systems: massive parallel-processing computing platform
- HStreaming: real-time data processing and analytics capabilities on top of Hadoop
- DataStax: DataStax Enterprise, Apache Cassandra based platform accepting real-time input from online applications, while offering analytic operations, powered by Hadoop
- Zettaset: Enterprise Data Analytics Suite built on Hadoop
- Hadapt: analytic platform based on Apache Hadoop and relational DBMS technology
I’ve left aside names like IBM, EMC, Informatica, which are doing a lot of integration work.
Original title and link: 8 Most Interesting Companies for Hadoop’s Future ( ©myNoSQL)
Eric Baldeschweiler in a recent briefing—transcript by Bert Latamore over Wikibon:
We’re really committed to building out Apache Hadoop and doing it in the Open Source community, so what really differentiates us is being really committed, besides shipping 100% pure Apache Hadoop code, which nobody else does, to taking a very partnering ecosystem-centric approach.[…] We’re the only ones committed to shipping Apache Hadoop code. We’ve been the drivers behind every major release of Apache Hadoop since its inception. Other companies are packaging and distributing Hadoop, but when they do that they add lots of their own custom stuff, both as patches to the Apache Hadoop distribution and also as independent products. A lot of that work is going into Apache, and since we committed to the Open Source model we’ve seen a lot more third party code going into Apache, which is obviously a win for the community. But to date no other company is actually taking releases from Apache & supporting them. They create their own versions that are slightly different from what comes from Apache, and try to build a business around that.
The political message from both Cloudera and Hortonworks is “we compete as businesses, but collaborate for the good of Hadoop“. But behind the curtains, they both prepare the big guns.
Original title and link: Hadoop Market: Hortonworks’ Positioning ( ©myNoSQL)
Hortonworks Data Platform, powered by Apache Hadoop — As we began to interact with enterprises and ecosystem partners, the one constant was the need for a base distribution of Apache Hadoop that is 100% open source and that contains the essential components used with every Hadoop installation. A distribution was needed to provide an easy to install, tightly integrated and well tested set of servers and tools. As we interacted with potential partners, we also heard the message loud and clear that they wanted open and secure APIs to easily integrate and extend Hadoop. We believe we have succeeded on both fronts. The Hortonworks Data Platform is such an open source distribution. It is powered by Apache Hadoop and includes the essential Hadoop components, plus some that make it more manageable, open and extensible. Our distribution is based on Hadoop 0.20.205, the first Apache Hadoop release that supports security and HBase. It also includes some new APIs, such as WebHDFS and those in Ambari and HCatalog, which will make it easy for our partners to integrate their products with Apache Hadoop. For those new to Ambari, it is an open source Apache project that will bring improved installation and management to Hadoop. HCatalog is a metadata management service for simplifying the sharing of data between Hadoop and other data systems. We are releasing Hortonworks Data Platform initially as a limited technology preview with plans to open it up to the public in early 2012.
The fight is on–even if for now the tone is still polite. And if we are adding to the mix MapR and LexisNexis’ HPCC, not to mention the armies of marketers and sales coming from Oracle, IBM, EMC, NetApp, etc. this actually smells like war.
Edward Ribeiro apty commented: “This reminds me of Linux distros war circa 2001”.
The emphasis in the text is mine to underline the most important aspects of the announcement. ↩
Original title and link: Hortonworks Data Platform: Hortonworks’ Hadoop Distribution ( ©myNoSQL)
This is ugly and should never happen to an open source project.
Still Joe Brockmeier (RWW) describes this as a superb win-win situation:
It might seem unhealthy for companies to be clamoring for credit in open source projects, but it’s a sign of health for projects. If companies position themselves to be top contributors, and care about their standing, the projects win. Users win too. Developers in the ecosystem also win – since it’s far easier to hire existing contributors than trying to push outsiders in to a project.
But there’s just a minor thing missing. Who gets the cheese?
Original title and link: Mine Is Bigger Than Yours: Hadoop Code Contributions ( ©myNoSQL)