ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Cascalog and Cascading: Productivity Solutions for Data Scientists

A good explanation of why Cascading, Cascalog, and other frameworks hiding away the details of MapReduce are making things easier for non-programmers:

Data scientists at The Climate Corporation chose to create their algorithms in Cascalog, which is a high-level Clojure-based machine learning language built on Cascading. Cascading is an advanced Java application framework that abstracts the MapReduce APIs in Apache Hadoop and provides developers with a simplified way to create powerful data processing workflows. Programming in Cascalog, data scientists create compact expressions that represent complex batch-oriented AI and machine learning workflows. This results in improved productivity for the data scientists, many of whom are mathematicians rather than computer scientists. It also gives them the ability to quickly analyze complex data sets without having to create large complicated programs in MapReduce. Furthermore, programmers at The Climate Corporation also use Cascading directly for creating jobs inside Hadoop streaming to process additional batch-oriented data workflows.

Original title and link: Cascalog and Cascading: Productivity Solutions for Data Scientists (NoSQL database©myNoSQL)

via: http://www.concurrentinc.com/case-studies/climate-corp/