Hortonworks: All content tagged as Hortonworks in NoSQL databases and polyglot persistence
Monday, 21 January 2013
Hadoop Business Ecosystem as of January 2013
As I was hoping and expecting, Datameer updated the chart visualizing Hadoop’s business side ecosystem:
It shouldn’t be a surprise to anyone that the top most connected companies in the Hadoop space are Cloudera and Hortonworks. They outrank the IT industry mammoths: IBM, HP, Microsoft, Oracle, SAP, etc.
Original title and link: Hadoop Business Ecosystem as of January 2013 (©myNoSQL)
via: http://www.datameer.com/blog/perspectives/hadoop-ecosystem-as-of-january-2013-now-an-app.html
Monday, 19 November 2012
HBase Roadmap
Deveraj Das’s post on Hortonworks blog details the current and future work on HBase:
- Reliability and High Availability (all data always available, and recovery from failures is quick)
- Autonomous operation (minimum operator intervention)
- Wire compatibility (to support rolling upgrades across a couple of versions at least)
- Cross data-center replication (for disaster recovery)
- Snapshots and backups (be able to take periodic snapshots of certain/all tables and be able to restore them at a later point if required)
- Monitoring and Diagnostics (which regionserver is hot or what caused an outage)
Future:
- Better and improved clients (asynchronous clients, and, in multiple languages)
- Cell-level security (access control for every cell in a table)
- Multi-tenancy (HBase becomes a viable shared platform for multiple applications using it)
- Secondary indexing functionality
Current work=reliability. Future work=usability.
Original title and link: HBase Roadmap (©myNoSQL)
Tuesday, 4 September 2012
Pig Performance and Optimization Analysis
Although Pig is designed as a data flow language, it supports all the functionalities required by TPC-H; thus it makes sense to use TPC-H to benchmark Pig’s performance. Below is the final result.
Original title and link: Pig Performance and Optimization Analysis (©myNoSQL)
via: http://hortonworks.com/blog/pig-performance-and-optimization-analysis/
Monday, 6 August 2012
HttpFS: Another Hadoop File System Over HTTP
Just a new HTTP interface for Hadoop file system. The main differences between HttpFS and WebHDFS are that this one is created by Cloudera, not Hortonworks (on top of their previos Hoop library) and:
HttpFs is a proxy so, unlike WebHDFS, it does not require clients be able to access every machine in the cluster. This allows clients to to access a cluster that is behind a firewall via the WebHDFS REST API.
Question is: if they are API compatible and both open source, why not unifying them?
Original title and link: HttpFS: Another Hadoop File System Over HTTP (©myNoSQL)
via: http://www.cloudera.com/blog/2012/08/httpfs-for-cdh3-the-hadoop-filesystem-over-http/
Monday, 16 July 2012
Hortonworks at 1 Year: Promises and Achievements
I’d normally wouldn’t link to a pat on the back post, but Hortonworks’ presence in the Hadoop market has accelerated its evolution and adoption. Plus the promises Hortonworks made 1 year ago represents a very good list of the shortcomings new adopters of Hadoop are still facing:
- make Apache Hadoop easier to install, manage, and use
- make Apache Hadoop more robus
- make Apache Hadoop easier to integrate and extend
- deliver an ever-increasing array of services aimed at improving the Hadoop experience and support in the growing needs of enterprises, systems integrators and technology vendors
Original title and link: Hortonworks at 1 Year: Promises and Achievements (©myNoSQL)
via: http://hortonworks.com/blog/happy-birthday-hortonworks/
Wednesday, 11 July 2012
The Hadoop Ecosystem Relationships
Excellent infographic about the relationships in the Hadoop market created with Datameer:
A while ago I’ve created a Google Spreadsheet in which I’ve tried to track all these relationships, but going through PR announcements wasn’t really my thing. Now there’s a CSV file with all this data.
Original title and link: The Hadoop Ecosystem Relationships (©myNoSQL)
via: http://www.cloudera.com/blog/2012/07/the-hadoop-ecosystem-visualized-in-datameer/
Friday, 15 June 2012
Pricing for Hadoop Support: Cloudera, Hortonworks, MapR
Found the following bits in a post on The Register by Timothy Prickett Morgan:
While Cloudera and MapR are charging $4,000 per node for their enterprise-class Hadoop distributions (including their proprietary extensions and tech support), Hortonworks doesn’t have any proprietary extensions and is living off of the support contracts for the HDP 1.0 stack. […] Hortonworks is not providing its full list price, but for a starter ten-node cluster, you can get a standard support contract for $12,000 per year.
Hortonworks’s pricing looks a bit aggressive, but this could be explained by the fact that Hortonworks Data Platform 1.0 was made available only this week.
For running Hadoop in the cloud, there’s also Amazon Elastic MapReduce whose pricing was always clear. And Amazon has recently announced support for MapR Hadoop distribution on Elastic MapReduce.
Original title and link: Pricing for Hadoop Support: Cloudera, Hortonworks, MapR (©myNoSQL)
Hortonworks Data Platform 1.0
Hortonworks has announced the 1.0 release of the Hortonworks Data Platform prior to the Hadoop Summit 2012 together with a lot of supporting quotes from companies like Attunity, Dataguise, Datameer, Karmasphere, Kognitio, MarkLogic, Microsoft, NetApp, StackIQ, Syncsort, Talend, 10gen, Teradata, and VMware.
Some info points:
-
Hortonworks Data Platform is a platform meant to simplify the installation, integration, management, and use of Apache Hadoop
- HDP 1.0 is based on Apache Hadoop 1.0
- Apache Ambari is used for installation and provisioning
- The same Apache Amabari is behind the Hortonworks Management Console
- For Data integration, HDP offers WebHDFS, HCatalog APIs, and Talend Open Studio
- Apache HCatalog is the solution offering metadata and table management
-
Hortonworks Data Platform is 100% open source—I really appreciate Hortonworks’s dedication to the Apache Hadoop project and open source community
- HDP comes with 3 levels of support subscriptions, pricing starting at $12500/year for a 10 nodes cluster
One of the most interesting aspects of the Hortonworks Data Platform release is that the high-availability (HA) option for HDP is based on using VMWare-powered virtual machines for the NameNode and JobTracker. My first thought about this approach is that it was chosen to strengthen a partnership with VMWare. On the other hand, Hadoop 2.0 contains already a new highly-available version of the NameNode (Cloudera Hadoop Distribution uses this solution) and VMWare has bigger plans for a virtualization-friendly Hadoop environment with project Serengeti.
You can read a lot of posts about this announcement, but you’ll find all the details in Hortonworks’s John Kreisa’s post here and the PR announcement.
Original title and link: Hortonworks Data Platform 1.0 (©myNoSQL)
Thursday, 7 June 2012
Looking to Stay Ahead of Hortonworks and MapR in the Hadoop Market, Cloudera Delivers High Availability, Better Security, and Easier System Management
Compare the title, which is the subtitle of the InformationWeek post, with this paragraph which reflects the reality:
Both Cloudera and Hortonworks will be distributing open source software from Apache’s Hadoop 2.3 release, which includes upgrades aimed at high-availability and improved security. The release includes a hot-failover for the NameNode (metadata server) of the Hadoop Distributed File System (HDFS), which has long been a single point of failure.
Cloudera is indeed one of the biggest Hadoop contributors and a company that have helped a lot proving and thus popularizing Hadoop through their packaging of open source Hadoop ecosystem components paired with their management tool (Cloudera Manager). But NameNode high availability and security improvements are part of the Apache Hadoop source code.
Original title and link: Looking to Stay Ahead of Hortonworks and MapR in the Hadoop Market, Cloudera Delivers High Availability, Better Security, and Easier System Management (©myNoSQL)
via: http://www.informationweek.com/news/software/info_management/240001574
Thursday, 17 May 2012
Big Data: Transactions Plus Interactions Plus Observations
A Hortonworks post listing the 7 key drivers for the Big Data market from the business, technical, and financial perspective:
Original title and link: Big Data: Transactions Plus Interactions Plus Observations (©myNoSQL)
via: http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/
Tuesday, 24 April 2012
Notes on the Hadoop and HBase Markets
Curt Monash shares what he heard from his customers:
- Over half of Cloudera’s customers (nb 100 subscription customers) use HBase
- Hortonworks thinks a typical enterprise Hadoop cluster has 20-50 nodes, with 50-100 already being on the large side.
- There are huge amounts of Elastic MapReduce/Hadoop processing in the Amazon cloud. Some estimates say it’s the majority of all Amazon Web Services processing.
Original title and link: Notes on the Hadoop and HBase Markets (©myNoSQL)
via: http://www.dbms2.com/2012/04/24/notes-on-the-hadoop-and-hbase-markets/
Tuesday, 3 April 2012
Big Data and Hadoop for C-Suites in 3 Minutes
Bring your own (small) popcorn as this is just like a TV ad:
Focus on the voice. Then slowly start repeating in your mind: “Big data. Hadoop. I love big data. I love Hadoop.
Original title and link: Big Data and Hadoop for C-Suites in 3 Minutes (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling




