Spark: All content tagged as Spark in NoSQL databases and polyglot persistence
Monday, 26 March 2012
Impressions About Hive, Pig, Scalding, Scoobi, Scrunch, Spark
Sami Badawi enumerates the issues he encountered while trying all these tools (Pig1, Scalding2, Scoobi3, Hive4, Spark5, Scrunch6, Cascalog7) for a simple experiment with Hadoop:
The task was to read log files join with other data do some statistics on arrays of doubles. Writing Hadoop MapReduce classes in Java is the assembly code of Big Data.
-
Pig : a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. ↩
-
Hive: a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. ↩
-
Spark: open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write ↩
-
Cascalog: a fully-featured Clojure-based data processing and querying library for Hadoop ↩
Original title and link: Impressions About Hive, Pig, Scalding, Scoobi, Scrunch, Spark (©myNoSQL)
via: http://blog.samibadawi.com/2012/03/hive-pig-scalding-scoobi-scrunch-and.html