ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Scoobi: All content tagged as Scoobi in NoSQL databases and polyglot persistence

Introducing Scoobi and Scalding: Scala DSLs for Hadoop MapReduce

After posting Impressions About Hive, Pig, Scalding, Scoobi, Scrunch, Spark, I’ve found myself wondering why so many of these libraries are built in Scala and what’s their main purpose. A day later and I’ve found Age Mooij‘s presentation about Scoobi and Scalding which provide an answer to my question. Plus a quick intro to Scoobi1 and Scalding2. Check the slides after the break.


Impressions About Hive, Pig, Scalding, Scoobi, Scrunch, Spark

Sami Badawi enumerates the issues he encountered while trying all these tools (Pig1, Scalding2, Scoobi3, Hive4, Spark5, Scrunch6, Cascalog7) for a simple experiment with Hadoop:

The task was to read log files join with other data do some statistics on arrays of doubles. Writing Hadoop MapReduce classes in Java is the assembly code of Big Data.


  1. Pig : a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. 

  2. Scalding: A Scala API for Cascading 

  3. Scoobi: a Scala productivity framework for Hadoop 

  4. Hive: a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. 

  5. Spark: open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write 

  6. Scrunch: a Scala wrapper for Crunch 

  7. Cascalog: a fully-featured Clojure-based data processing and querying library for Hadoop  

Original title and link: Impressions About Hive, Pig, Scalding, Scoobi, Scrunch, Spark (NoSQL database©myNoSQL)

via: http://blog.samibadawi.com/2012/03/hive-pig-scalding-scoobi-scrunch-and.html