ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Hortonworks: All content tagged as Hortonworks in NoSQL databases and polyglot persistence

Joyent Solution for Hadoop Is About Speed

As with Riak’s hosting on Engine Yard, I’ve been wondering what Joyent solution for Hadoop is about. John Rath writes for DataCenterKnowledge:

Software product development services company Altoros Systems said that Hadoop clusters on Joyent Cloud produced a nearly 3X faster disk I/O response time versus identically-sized infrastructure. Through the use of the Joyent operating system virtualization and CPU bursting technology, Joyent says it is able to extract better response times and deliver results to data scientists and analysts faster.

Original title and link: Joyent Solution for Hadoop Is About Speed (NoSQL database©myNoSQL)

via: http://www.datacenterknowledge.com/archives/2013/01/24/joyent-enters-big-data-hadoop-solution/


Hadoop in 2013: What Hortonworks Will Focus On

Shaun Connolly summarizing a recent webinar about where Hortonwork’s work on Hadoop will focus in 2013:

[…] Interactive Query, Business Continuity (DR, Snapshots, etc.), Secure Access, as well as ongoing investments in Data Integration, Management (i.e. Ambari), and Online Data (i.e. HBase).
[…] Rather than abandon the Apache Hive community, Hortonworks is focused on working in the community to optimize Hive’s ability to serve big data exploration and interactive query in support of important BI use cases. Moreover, we are focused on enabling Hive to take advantage of YARN in Apache Hadoop 2.0, which will help ensure fast query workloads don’t compete for resources with the other jobs running in the cluster. Enabling Hadoop to predictably support enterprise workloads that span Batch, Interactive, and Online use cases is an important area of focus for us.

Basically this says that Hortonworks sees YARN and Hive as the answer to online or real-time interactive querying of Hadoop data. Cloudera’s take on this is different.

Original title and link: Hadoop in 2013: What Hortonworks Will Focus On (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/the-road-ahead-for-hortonworks-and-hadoop/


Hortonworks Joins OpenStack Foundation

Hortonworks, a leading contributor to Apache Hadoop, today announced it has joined the OpenStack Foundation, which promotes the development, distribution and adoption of the OpenStack cloud operating system. By contributing to the OpenStack ecosystem, Hortonworks is supporting the open source community and facilitating adoption of 100-percent open source Apache Hadoop-based solutions in the cloud. Now customers will be able to access an enterprise-ready Hortonworks Data Platform built for the cloud that alleviates the time and complexities of manually deploying a big data solution.

What took this so long? Cloudera has been part of OpenStack since 2010.

Original title and link: Hortonworks Joins OpenStack Foundation (NoSQL database©myNoSQL)

via: http://hortonworks.com/about-us/news/hortonworks-joins-openstack-foundation/


Hadoop in the Cloud: Skytap and Joyent

Besides the well established Amazon Elastic MapReduce and Windows Azure HDInsight, there are two new Hadoop-in-the-cloud services:

  • Skytap which offers Cloudera CDH4 Enterprise experimentation clusters up to 50 nodes
  • Joyent Solution for Hadoop which is offered in partnership with Hortonworks. I hesitated for a bit to mention Joyent considering the page says “Sign up now to talk to a Joyent Solutions Architect” which is anything but a cloud service.

Original title and link: Hadoop in the Cloud: Skytap and Joyent (NoSQL database©myNoSQL)


Hadoop Business Ecosystem as of January 2013

As I was hoping and expecting, Datameer updated the chart visualizing Hadoop’s business side ecosystem:

hadoop_ecosystem_full2

It shouldn’t be a surprise to anyone that the top most connected companies in the Hadoop space are Cloudera and Hortonworks. They outrank the IT industry mammoths: IBM, HP, Microsoft, Oracle, SAP, etc.

Original title and link: Hadoop Business Ecosystem as of January 2013 (NoSQL database©myNoSQL)

via: http://www.datameer.com/blog/perspectives/hadoop-ecosystem-as-of-january-2013-now-an-app.html


HBase Roadmap

Deveraj Das’s post on Hortonworks blog details the current and future work on HBase:

  1. Reliability and High Availability (all data always available, and recovery from failures is quick)
  2. Autonomous operation (minimum operator intervention)
  3. Wire compatibility (to support rolling upgrades across a couple of versions at least)
  4. Cross data-center replication (for disaster recovery)
  5. Snapshots and backups (be able to take periodic snapshots of certain/all tables and be able to restore them at a later point if required)
  6. Monitoring and Diagnostics (which regionserver is hot or what caused an outage)

Future:

  1. Better and improved clients (asynchronous clients, and, in multiple languages)
  2. Cell-level security (access control for every cell in a table)
  3. Multi-tenancy (HBase becomes a viable shared platform for multiple applications using it)
  4. Secondary indexing functionality

Current work=reliability. Future work=usability.

Original title and link: HBase Roadmap (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/hbase-futures/


Pig Performance and Optimization Analysis

Although Pig is designed as a data flow language, it supports all the functionalities required by TPC-H; thus it makes sense to use TPC-H to benchmark Pig’s performance. Below is the final result.

tpc-h 100gb

Original title and link: Pig Performance and Optimization Analysis (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/pig-performance-and-optimization-analysis/


HttpFS: Another Hadoop File System Over HTTP

Just a new HTTP interface for Hadoop file system. The main differences between HttpFS and WebHDFS are that this one is created by Cloudera, not Hortonworks (on top of their previos Hoop library) and:

HttpFs is a proxy so, unlike WebHDFS, it does not require clients be able to access every machine in the cluster. This allows clients to to access a cluster that is behind a firewall via the WebHDFS REST API.

Question is: if they are API compatible and both open source, why not unifying them?

Original title and link: HttpFS: Another Hadoop File System Over HTTP (NoSQL database©myNoSQL)

via: http://www.cloudera.com/blog/2012/08/httpfs-for-cdh3-the-hadoop-filesystem-over-http/


Hortonworks at 1 Year: Promises and Achievements

I’d normally wouldn’t link to a pat on the back post, but Hortonworks’ presence in the Hadoop market has accelerated its evolution and adoption. Plus the promises Hortonworks made 1 year ago represents a very good list of the shortcomings new adopters of Hadoop are still facing:

  • make Apache Hadoop easier to install, manage, and use
  • make Apache Hadoop more robus
  • make Apache Hadoop easier to integrate and extend
  • deliver an ever-increasing array of services aimed at improving the Hadoop experience and support in the growing needs of enterprises, systems integrators and technology vendors

Original title and link: Hortonworks at 1 Year: Promises and Achievements (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/happy-birthday-hortonworks/


The Hadoop Ecosystem Relationships

Excellent infographic about the relationships in the Hadoop market created with Datameer:

Hadoop-Ecosystem-Infographic1

A while ago I’ve created a Google Spreadsheet in which I’ve tried to track all these relationships, but going through PR announcements wasn’t really my thing. Now there’s a CSV file with all this data.

Original title and link: The Hadoop Ecosystem Relationships (NoSQL database©myNoSQL)

via: http://www.cloudera.com/blog/2012/07/the-hadoop-ecosystem-visualized-in-datameer/


Pricing for Hadoop Support: Cloudera, Hortonworks, MapR

Found the following bits in a post on The Register by Timothy Prickett Morgan:

While Cloudera and MapR are charging $4,000 per node for their enterprise-class Hadoop distributions (including their proprietary extensions and tech support), Hortonworks doesn’t have any proprietary extensions and is living off of the support contracts for the HDP 1.0 stack. […] Hortonworks is not providing its full list price, but for a starter ten-node cluster, you can get a standard support contract for $12,000 per year.

Hortonworks’s pricing looks a bit aggressive, but this could be explained by the fact that Hortonworks Data Platform 1.0 was made available only this week.

For running Hadoop in the cloud, there’s also Amazon Elastic MapReduce whose pricing was always clear. And Amazon has recently announced support for MapR Hadoop distribution on Elastic MapReduce.

Original title and link: Pricing for Hadoop Support: Cloudera, Hortonworks, MapR (NoSQL database©myNoSQL)


Hortonworks Data Platform 1.0

Hortonworks has announced the 1.0 release of the Hortonworks Data Platform prior to the Hadoop Summit 2012 together with a lot of supporting quotes from companies like Attunity, Dataguise, Datameer, Karmasphere, Kognitio, MarkLogic, Microsoft, NetApp, StackIQ, Syncsort, Talend, 10gen, Teradata, and VMware.

Some info points:

  1. Hortonworks Data Platform is a platform meant to simplify the installation, integration, management, and use of Apache Hadoop

    hdp-diagram

    1. HDP 1.0 is based on Apache Hadoop 1.0
    2. Apache Ambari is used for installation and provisioning
    3. The same Apache Amabari is behind the Hortonworks Management Console
    4. For Data integration, HDP offers WebHDFS, HCatalog APIs, and Talend Open Studio
    5. Apache HCatalog is the solution offering metadata and table management
  2. Hortonworks Data Platform is 100% open source—I really appreciate Hortonworks’s dedication to the Apache Hadoop project and open source community

  3. HDP comes with 3 levels of support subscriptions, pricing starting at $12500/year for a 10 nodes cluster

One of the most interesting aspects of the Hortonworks Data Platform release is that the high-availability (HA) option for HDP is based on using VMWare-powered virtual machines for the NameNode and JobTracker. My first thought about this approach is that it was chosen to strengthen a partnership with VMWare. On the other hand, Hadoop 2.0 contains already a new highly-available version of the NameNode (Cloudera Hadoop Distribution uses this solution) and VMWare has bigger plans for a virtualization-friendly Hadoop environment with project Serengeti.

You can read a lot of posts about this announcement, but you’ll find all the details in Hortonworks’s John Kreisa’s post here and the PR announcement.

Original title and link: Hortonworks Data Platform 1.0 (NoSQL database©myNoSQL)