Cascalog and Cascading: Productivity Solutions for Data Scientists
A good explanation of why Cascading, Cascalog, and other frameworks hiding away the details of MapReduce are making things easier for non-programmers:
Data scientists at The Climate Corporation chose to create their algorithms in Cascalog, which is a high-level Clojure-based machine learning language built on Cascading. Cascading is an advanced Java application framework that abstracts the MapReduce APIs in Apache Hadoop and provides developers with a simplified way to create powerful data processing workflows. Programming in Cascalog, data scientists create compact expressions that represent complex batch-oriented AI and machine learning workflows. This results in improved productivity for the data scientists, many of whom are mathematicians rather than computer scientists. It also gives them the ability to quickly analyze complex data sets without having to create large complicated programs in MapReduce. Furthermore, programmers at The Climate Corporation also use Cascading directly for creating jobs inside Hadoop streaming to process additional batch-oriented data workflows.
Original title and link: Cascalog and Cascading: Productivity Solutions for Data Scientists (©myNoSQL)
via: http://www.concurrentinc.com/case-studies/climate-corp/