ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Impressions About Hive, Pig, Scalding, Scoobi, Scrunch, Spark

Sami Badawi enumerates the issues he encountered while trying all these tools (Pig1, Scalding2, Scoobi3, Hive4, Spark5, Scrunch6, Cascalog7) for a simple experiment with Hadoop:

The task was to read log files join with other data do some statistics on arrays of doubles. Writing Hadoop MapReduce classes in Java is the assembly code of Big Data.


  1. Pig : a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. 

  2. Scalding: A Scala API for Cascading 

  3. Scoobi: a Scala productivity framework for Hadoop 

  4. Hive: a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. 

  5. Spark: open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write 

  6. Scrunch: a Scala wrapper for Crunch 

  7. Cascalog: a fully-featured Clojure-based data processing and querying library for Hadoop  

Original title and link: Impressions About Hive, Pig, Scalding, Scoobi, Scrunch, Spark (NoSQL database©myNoSQL)

via: http://blog.samibadawi.com/2012/03/hive-pig-scalding-scoobi-scrunch-and.html