ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Cloudera Distribution for Hadoop will include PIG, Hive and why it matters

Cloudera distributes an easy to install pre-packaged version of Hadoop that includes various bug fixes and optimizations. Yesterday they have announced the availability of a new version called ☞ CDH2 (nb Cloudera Distribution for Hadoop), but also the first beta of the upcoming version that will include support for Pig and Hive, the tools that help you put your NoSQL data to work.

But why is this important? While NoSQL solutions are helping us tackle problems like

  • cost[1] and complexity[2] and productivity
  • availability, scalability
  • storing huge amounts of data[3]

none of these are really the end goals. While I don’t feel comfortable disagreeing with Google’s chief scientist, Peter Norvig:

We don’t have better algorithms than anyone else. We just have more data.

I don’t really think it is only about the data, but rather the intel that can be built around the data. And that’s exactly what tools like Hadoop and PIG and Hive will help us achieve.

We have a system in place based on shared mysql + memcache but its quickly becoming prohibitively costly (in terms of manpower) to operate.

References