Apache Pig 0.8: What is New
Dmitriy Ryaboy1 has a guest post on Cloudera blog covering the new features in Apache Pig 0.8.
Summarized:
- Support for user defined functions (UDF) in scripting languages
- Generic UDFs: allows invocation of static java methods
- PigUnit: as the name suggests, a testing tool for Pig scripts
- PigStats: once again the name should give you a hint of what it does: better visibility into Pig job through a series of stats, XML-based metadata injected into Map-Reduce jobs, and listeners for the Pig process
- Scalar values: simplifying access to single-row relations
- possibility to start a monitoring thread for long running executions
- HBaseStorage: works with HBase 0.20 releases only
- flow allows custom Map-Reduce jobs
- automatic merge of small files
- custom partitioners
The Pig 0.8 release includes a large number of bug fixes and optimizations, but at the core it is a feature release. It’s been in the works for almost a full year and the amount of time spent on 0.8 really shows.
You can also check Dmitriy’s presentations about the NoSQL ecosystem at Twitter: Twitter, Pig, and HBase and HBase and Pig: The Hadoop ecosystem at Twitter
-
Dmitriy Ryaboy: Twitter engineer, @squarecog ↩
Original title and link: Apache Pig 0.8: What is New (NoSQL databases © myNoSQL)
via: http://www.cloudera.com/blog/2010/12/new-features-in-apache-pig-0-8/