HortonWorks: All content tagged as HortonWorks in NoSQL databases and polyglot persistence
Monday, 29 April 2013
Project Savanna: Hadoop and OpenStack
Timothy Prickett Morgan for The Register about Project Savanna, a collaboration between Mirantis, Hortonworks, and Red Hat:
Batman and Robin. Peanut butter and chocolate. OpenStack and Hadoop. These are things that go together, with the latter pairing being something that commercial OpenStack distie Mirantis, commercial Hadoop distie Hortonworks, and commercial KVM and Linux distie (and soon to be OpenStack commercializer) Red Hat are putting together under a new OpenStack effort dubbed Project Savanna.
Hadoop is at the age where everyone tries to package it and claim they’ll be the Red Hat of the Hadoop ecosystem. I cannot really dot the i-s and cross the t-s, but my gut feeling is that right now all these are actually more similar to the attempts of bringing Linux to the desktop.
We know how successful these have been so far.
Original title and link: Project Savanna: Hadoop and OpenStack (©myNoSQL)
via: http://www.theregister.co.uk/2013/04/18/project_savanna_hadoop_on_openstack/
Thursday, 25 April 2013
Project Falcon: Tackling Hadoop Data Lifecycle Management
Venkatesh Seetharam announcing a new Apache incubating project in the Hadoop ecosystem open sourced by InMobi and Hortonworks:
Today we are excited to see another example of the power of community at work as we highlight the newly approved Apache Software Foundation incubator project named Falcon. This incubation project was initiated by the team at InMobi together with engineers from Hortonworks. Falcon is useful to anyone building apps on Hadoop as it simplifies data management through the introduction of a data lifecycle management framework.
I think this diagram describes Project Falcon best:
✚ Was there any other project addressing this space?
Original title and link: Project Falcon: Tackling Hadoop Data Lifecycle Management (©myNoSQL)
Saturday, 6 April 2013
Hadoop Now, Next and Beyond - Keynote by Eric Baldeschwieler
Eric Baldeschwieler’s keynote from HadoopSummit has been published on YouTube. It’s mainly about the goals and effort behind Hadoop 2.0 and the new tools in the Hadoop’s ecosystem meant to simplify different aspects of a Hadoop deployment (HCatalog, Ambary, Tez, Stinger Initiative).
✚ Datanami has published a summary of the keynote here
Original title and link: Hadoop Now, Next and Beyond - Keynote by Eric Baldeschwieler (©myNoSQL)
Thursday, 4 April 2013
Halo 4: A Success Case Study of HDInsight, Microsoft's Hadoop on Azure
Besides a bit too many businessy words, this is a nice story of using HDInsight, the Hadoop solution for Windows developed by Microsoft and Hortonworks:
Behind the scenes, a powerful new Microsoft technology platform called HDInsight was capturing data from the cloud and feeding daily game statistics to the tournament’s operator, Virgin Gaming. Virgin not only used the data to update online leaderboards each day; it also relied on the data to detect cheaters, removing them from the boards to ensure that the right gamers got the chance to win.
But this new technology didn’t just support the Infinity Challenge. From day one, the Xbox 360 game has been using the Hadoop open source framework to gain deep insights into players. The Halo 4 development team at 343 Industries is taking these insights and updating the game almost weekly, using direct player feedback to tweak the game. In the process, the game’s multiplayer ecosystem continues to evolve with the community as the title matures in the marketplace.
Original title and link: Halo 4: A Success Case Study of HDInsight, Microsoft’s Hadoop on Azure (©myNoSQL)
Monday, 4 March 2013
How Many Hadoops?
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
-
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? (©myNoSQL)
Wednesday, 20 February 2013
Hortonworks: The Fastest Path to Innovation: Community Driven Open Source
Shaun Connolly for the Hortonworks blog:
we believe the fastest way to innovate is to do our work within the open source community, introduce enterprise feature requirements into that public domain, and to work diligently to progress existing open source projects and incubate new projects to meet those needs.
In support of our approach, this week we’ve announced the submission of two new incubation projects to the Apache Software foundation together with the launch of the “Stinger Initiative”, all aimed at enhancing the security and performance of Hadoop applications.
I’m forced, but extremely happy to take back what I said.
- Stinger: an initiative to speed up Apache Hive for interactive queries. Read about it here
- Know Gateway: a solution for authentication and security in Hadoop. More details here
- Tez framework: a new Hadoop YARN-based runtime for improved latency and throughput. Details here
Hortonworks believes in open source.
Original title and link: Hortonworks: The Fastest Path to Innovation: Community Driven Open Source (©myNoSQL)
via: http://hortonworks.com/blog/hortonworks-community-leadership/
Tuesday, 19 February 2013
Hortonworks and Community Driven Hadoop
First, “We Believe… in community driven Enterprise Apache Hadoop” and then the next day “Announcing Apache Hadoop 2.0.3 Release and Roadmap“. These two posts published within 2 days on Hortonworks’s blog don’t entirely support each other. At least not without a bit of a different formulation and linking to the announcement sent to the Hadoop mailing list.
Original title and link: Hortonworks and Community Driven Hadoop (©myNoSQL)
Monday, 18 February 2013
VMware Sues Hortonworks
Stay calm. Hadoop is safe.
The Register:
VMware has taken Hortonworks to court along with four ex-VMers who now work at the startup - and among them is VMWare’s former global sales chief.
Original title and link: VMware Sues Hortonworks (©myNoSQL)
via: http://www.theregister.co.uk/2013/02/15/vmware_legal_action_hortonworks/
Friday, 1 February 2013
Joyent Solution for Hadoop Is About Speed
As with Riak’s hosting on Engine Yard, I’ve been wondering what Joyent solution for Hadoop is about. John Rath writes for DataCenterKnowledge:
Software product development services company Altoros Systems said that Hadoop clusters on Joyent Cloud produced a nearly 3X faster disk I/O response time versus identically-sized infrastructure. Through the use of the Joyent operating system virtualization and CPU bursting technology, Joyent says it is able to extract better response times and deliver results to data scientists and analysts faster.
Original title and link: Joyent Solution for Hadoop Is About Speed (©myNoSQL)
via: http://www.datacenterknowledge.com/archives/2013/01/24/joyent-enters-big-data-hadoop-solution/
Wednesday, 30 January 2013
Hadoop in 2013: What Hortonworks Will Focus On
Shaun Connolly summarizing a recent webinar about where Hortonwork’s work on Hadoop will focus in 2013:
[…] Interactive Query, Business Continuity (DR, Snapshots, etc.), Secure Access, as well as ongoing investments in Data Integration, Management (i.e. Ambari), and Online Data (i.e. HBase).
[…] Rather than abandon the Apache Hive community, Hortonworks is focused on working in the community to optimize Hive’s ability to serve big data exploration and interactive query in support of important BI use cases. Moreover, we are focused on enabling Hive to take advantage of YARN in Apache Hadoop 2.0, which will help ensure fast query workloads don’t compete for resources with the other jobs running in the cluster. Enabling Hadoop to predictably support enterprise workloads that span Batch, Interactive, and Online use cases is an important area of focus for us.
Basically this says that Hortonworks sees YARN and Hive as the answer to online or real-time interactive querying of Hadoop data. Cloudera’s take on this is different.
Original title and link: Hadoop in 2013: What Hortonworks Will Focus On (©myNoSQL)
via: http://hortonworks.com/blog/the-road-ahead-for-hortonworks-and-hadoop/
Tuesday, 29 January 2013
Hortonworks Joins OpenStack Foundation
Hortonworks, a leading contributor to Apache Hadoop, today announced it has joined the OpenStack Foundation, which promotes the development, distribution and adoption of the OpenStack cloud operating system. By contributing to the OpenStack ecosystem, Hortonworks is supporting the open source community and facilitating adoption of 100-percent open source Apache Hadoop-based solutions in the cloud. Now customers will be able to access an enterprise-ready Hortonworks Data Platform built for the cloud that alleviates the time and complexities of manually deploying a big data solution.
What took this so long? Cloudera has been part of OpenStack since 2010.
Original title and link: Hortonworks Joins OpenStack Foundation (©myNoSQL)
via: http://hortonworks.com/about-us/news/hortonworks-joins-openstack-foundation/
Thursday, 24 January 2013
Hadoop in the Cloud: Skytap and Joyent
Besides the well established Amazon Elastic MapReduce and Windows Azure HDInsight, there are two new Hadoop-in-the-cloud services:
- Skytap which offers Cloudera CDH4 Enterprise experimentation clusters up to 50 nodes
- Joyent Solution for Hadoop which is offered in partnership with Hortonworks. I hesitated for a bit to mention Joyent considering the page says “Sign up now to talk to a Joyent Solutions Architect” which is anything but a cloud service.
Original title and link: Hadoop in the Cloud: Skytap and Joyent (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling
