ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

HortonWorks: All content tagged as HortonWorks in NoSQL databases and polyglot persistence

Hadoop in the Cloud: Skytap and Joyent

Besides the well established Amazon Elastic MapReduce and Windows Azure HDInsight, there are two new Hadoop-in-the-cloud services:

  • Skytap which offers Cloudera CDH4 Enterprise experimentation clusters up to 50 nodes
  • Joyent Solution for Hadoop which is offered in partnership with Hortonworks. I hesitated for a bit to mention Joyent considering the page says “Sign up now to talk to a Joyent Solutions Architect” which is anything but a cloud service.

Original title and link: Hadoop in the Cloud: Skytap and Joyent (NoSQL database©myNoSQL)


Hadoop Business Ecosystem as of January 2013

As I was hoping and expecting, Datameer updated the chart visualizing Hadoop’s business side ecosystem:

hadoop_ecosystem_full2

It shouldn’t be a surprise to anyone that the top most connected companies in the Hadoop space are Cloudera and Hortonworks. They outrank the IT industry mammoths: IBM, HP, Microsoft, Oracle, SAP, etc.

Original title and link: Hadoop Business Ecosystem as of January 2013 (NoSQL database©myNoSQL)

via: http://www.datameer.com/blog/perspectives/hadoop-ecosystem-as-of-january-2013-now-an-app.html


HBase Roadmap

Deveraj Das’s post on Hortonworks blog details the current and future work on HBase:

  1. Reliability and High Availability (all data always available, and recovery from failures is quick)
  2. Autonomous operation (minimum operator intervention)
  3. Wire compatibility (to support rolling upgrades across a couple of versions at least)
  4. Cross data-center replication (for disaster recovery)
  5. Snapshots and backups (be able to take periodic snapshots of certain/all tables and be able to restore them at a later point if required)
  6. Monitoring and Diagnostics (which regionserver is hot or what caused an outage)

Future:

  1. Better and improved clients (asynchronous clients, and, in multiple languages)
  2. Cell-level security (access control for every cell in a table)
  3. Multi-tenancy (HBase becomes a viable shared platform for multiple applications using it)
  4. Secondary indexing functionality

Current work=reliability. Future work=usability.

Original title and link: HBase Roadmap (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/hbase-futures/


Pig Performance and Optimization Analysis

Although Pig is designed as a data flow language, it supports all the functionalities required by TPC-H; thus it makes sense to use TPC-H to benchmark Pig’s performance. Below is the final result.

tpc-h 100gb

Original title and link: Pig Performance and Optimization Analysis (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/pig-performance-and-optimization-analysis/


HttpFS: Another Hadoop File System Over HTTP

Just a new HTTP interface for Hadoop file system. The main differences between HttpFS and WebHDFS are that this one is created by Cloudera, not Hortonworks (on top of their previos Hoop library) and:

HttpFs is a proxy so, unlike WebHDFS, it does not require clients be able to access every machine in the cluster. This allows clients to to access a cluster that is behind a firewall via the WebHDFS REST API.

Question is: if they are API compatible and both open source, why not unifying them?

Original title and link: HttpFS: Another Hadoop File System Over HTTP (NoSQL database©myNoSQL)

via: http://www.cloudera.com/blog/2012/08/httpfs-for-cdh3-the-hadoop-filesystem-over-http/


Hortonworks at 1 Year: Promises and Achievements

I’d normally wouldn’t link to a pat on the back post, but Hortonworks’ presence in the Hadoop market has accelerated its evolution and adoption. Plus the promises Hortonworks made 1 year ago represents a very good list of the shortcomings new adopters of Hadoop are still facing:

  • make Apache Hadoop easier to install, manage, and use
  • make Apache Hadoop more robus
  • make Apache Hadoop easier to integrate and extend
  • deliver an ever-increasing array of services aimed at improving the Hadoop experience and support in the growing needs of enterprises, systems integrators and technology vendors

Original title and link: Hortonworks at 1 Year: Promises and Achievements (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/happy-birthday-hortonworks/


The Hadoop Ecosystem Relationships

Excellent infographic about the relationships in the Hadoop market created with Datameer:

Hadoop-Ecosystem-Infographic1

A while ago I’ve created a Google Spreadsheet in which I’ve tried to track all these relationships, but going through PR announcements wasn’t really my thing. Now there’s a CSV file with all this data.

Original title and link: The Hadoop Ecosystem Relationships (NoSQL database©myNoSQL)

via: http://www.cloudera.com/blog/2012/07/the-hadoop-ecosystem-visualized-in-datameer/


Pricing for Hadoop Support: Cloudera, Hortonworks, MapR

Found the following bits in a post on The Register by Timothy Prickett Morgan:

While Cloudera and MapR are charging $4,000 per node for their enterprise-class Hadoop distributions (including their proprietary extensions and tech support), Hortonworks doesn’t have any proprietary extensions and is living off of the support contracts for the HDP 1.0 stack. […] Hortonworks is not providing its full list price, but for a starter ten-node cluster, you can get a standard support contract for $12,000 per year.

Hortonworks’s pricing looks a bit aggressive, but this could be explained by the fact that Hortonworks Data Platform 1.0 was made available only this week.

For running Hadoop in the cloud, there’s also Amazon Elastic MapReduce whose pricing was always clear. And Amazon has recently announced support for MapR Hadoop distribution on Elastic MapReduce.

Original title and link: Pricing for Hadoop Support: Cloudera, Hortonworks, MapR (NoSQL database©myNoSQL)


Hortonworks Data Platform 1.0

Hortonworks has announced the 1.0 release of the Hortonworks Data Platform prior to the Hadoop Summit 2012 together with a lot of supporting quotes from companies like Attunity, Dataguise, Datameer, Karmasphere, Kognitio, MarkLogic, Microsoft, NetApp, StackIQ, Syncsort, Talend, 10gen, Teradata, and VMware.

Some info points:

  1. Hortonworks Data Platform is a platform meant to simplify the installation, integration, management, and use of Apache Hadoop

    hdp-diagram

    1. HDP 1.0 is based on Apache Hadoop 1.0
    2. Apache Ambari is used for installation and provisioning
    3. The same Apache Amabari is behind the Hortonworks Management Console
    4. For Data integration, HDP offers WebHDFS, HCatalog APIs, and Talend Open Studio
    5. Apache HCatalog is the solution offering metadata and table management
  2. Hortonworks Data Platform is 100% open source—I really appreciate Hortonworks’s dedication to the Apache Hadoop project and open source community

  3. HDP comes with 3 levels of support subscriptions, pricing starting at $12500/year for a 10 nodes cluster

One of the most interesting aspects of the Hortonworks Data Platform release is that the high-availability (HA) option for HDP is based on using VMWare-powered virtual machines for the NameNode and JobTracker. My first thought about this approach is that it was chosen to strengthen a partnership with VMWare. On the other hand, Hadoop 2.0 contains already a new highly-available version of the NameNode (Cloudera Hadoop Distribution uses this solution) and VMWare has bigger plans for a virtualization-friendly Hadoop environment with project Serengeti.

You can read a lot of posts about this announcement, but you’ll find all the details in Hortonworks’s John Kreisa’s post here and the PR announcement.

Original title and link: Hortonworks Data Platform 1.0 (NoSQL database©myNoSQL)


Looking to Stay Ahead of Hortonworks and MapR in the Hadoop Market, Cloudera Delivers High Availability, Better Security, and Easier System Management

Compare the title, which is the subtitle of the InformationWeek post, with this paragraph which reflects the reality:

Both Cloudera and Hortonworks will be distributing open source software from Apache’s Hadoop 2.3 release, which includes upgrades aimed at high-availability and improved security. The release includes a hot-failover for the NameNode (metadata server) of the Hadoop Distributed File System (HDFS), which has long been a single point of failure.

Cloudera is indeed one of the biggest Hadoop contributors and a company that have helped a lot proving and thus popularizing Hadoop through their packaging of open source Hadoop ecosystem components paired with their management tool (Cloudera Manager). But NameNode high availability and security improvements are part of the Apache Hadoop source code.

Original title and link: Looking to Stay Ahead of Hortonworks and MapR in the Hadoop Market, Cloudera Delivers High Availability, Better Security, and Easier System Management (NoSQL database©myNoSQL)

via: http://www.informationweek.com/news/software/info_management/240001574


Big Data: Transactions Plus Interactions Plus Observations

A Hortonworks post listing the 7 key drivers for the Big Data market from the business, technical, and financial perspective:

bigdata_diagram

Original title and link: Big Data: Transactions Plus Interactions Plus Observations (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/


Notes on the Hadoop and HBase Markets

Curt Monash shares what he heard from his customers:

  • Over half of Cloudera’s customers (nb 100 subscription customers) use HBase
  • Hortonworks thinks a typical enterprise Hadoop cluster has 20-50 nodes, with 50-100 already being on the large side.
  • There are huge amounts of Elastic MapReduce/Hadoop processing in the Amazon cloud. Some estimates say it’s the majority of all Amazon Web Services processing.

Original title and link: Notes on the Hadoop and HBase Markets (NoSQL database©myNoSQL)

via: http://www.dbms2.com/2012/04/24/notes-on-the-hadoop-and-hbase-markets/