Cascading: All content tagged as Cascading in NoSQL databases and polyglot persistence
Earlier today I’ve posted Dean Wampler’s video Overview of Scalding. Scalding is a Scala API on top of Cascading1. Below you can find the video and slides from Paco Nathan’s Cascading presentation at Chicago Hadoop User Group:
In this video he will introduce Cascading, then examine the concept of a “workflow” as an abstraction for integrating Hadoop with other systems. We’ll show new features including support for SQL-92, PMML, plus an application manager.
✚ Leaving aside the Java vs. Scala part, I’m still not sure I see any major advantages of any of these libraries over the other. Besides tighter integration with an existing environment.
Original title and link: An Overview of Cascading ( ©myNoSQL)
“There’s not better way to write general-purpose Hadoop MapReduce programs when specialized tools like Hive and Pig aren’t quite what you need.”
Watch the video and slides after below.
✚ At Twitter, the creators of Scalding, different teams use different libraries for dealing with different scenarios.
✚ Dean Wampler is the co-author of the Programming Scala book so his preference for Scala is understandable.
✚ Do you know any other teams or companies using Scalding instead of Cascading or Cascalog?
Original title and link: An Overview of Scalding ( ©myNoSQL)
Cascading the Java framework offering data processing, data flow, data integration, and process scheduling APIs for Hadoop has reached version 2.0. The most interesting points in this release summarized on the Cascading blog:
- Apache 2.0 Licensing
- Support for Hadoop 1.0.2
- Local and Hadoop planner modes, where local runs in memory without Hadoop dependencies
- HashJoin pipe for “map side joins”
- Merge pipe for “map side merges”
- Simple Checkpointing for capturing intermediate data as a file
- Improved Tap and Scheme APIs
Original title and link: Cascading 2.0 Released ( ©myNoSQL)