Scadling: All content tagged as Scadling in NoSQL databases and polyglot persistence
Tuesday, 21 February 2012
An Introduction to Scalding, the Scala and Cascading MapReduce Framework From Twitter
A fantastic guide to Twitter’s Scala and Cascading MapReduce framework Scalding from Edwin Chen1:
In 140: instead of forcing you to write raw map and reduce functions, Scalding allows you to write natural code like
// Create a histogram of tweet lengths. tweets.map('tweet -> 'length) { tweet : String => tweet.size }.groupBy('length) { _.size }
Looking at the code samples, this looks a lot like Apache Pig. But the Scalding documentation compares it to Scrunch/Scoobi and points to the answers in this Quora thread:
The main difference between Scalding (and Cascading) and Scrunch/Scoobi is that Cascading has a record model where each element in your distributed list/table is a table with some named fields. This is nice because most common cases are to have a few primitive columns (ints, strings, etc…).
-
Edwin Chen is data scientist at Twitter ↩
Original title and link: An Introduction to Scalding, the Scala and Cascading MapReduce Framework From Twitter (©myNoSQL)
via: http://blog.echen.me/2012/02/09/movie-recommendations-and-more-via-mapreduce-and-scalding/