- The ability to orchestrate execution of Hadoop related tasks (i.e., executing a Hive Query, Pig Script, or M/R job) as part of a broader IT workflow.
- The ability to setup dependencies, so if a step fails the job can branch down a recovery path or send a notification, or if it’s a success it goes on to subsequent dependent tasks. Likewise it supports initiating several tasks in parallel.
- New integration for Pig — so that developers have the ability to execute a Pig job from a PDI Job flow, integrate the execution of Pig jobs in broader IT workflows through PDI Jobs, take advantage of our out of the box scheduler, and so on.
The list of tools Pentaho 4 integrates with is quite long:
- a long list of traditional RDBMS
- analytics databases (Greenplum, Vertica, Netezza, Teradata, etc.)
- NoSQL databases (MongoDB, HBase, etc.)
- Hadoop variants
- LexisNexis HPCC
This is the world of polyglot persistence and hybrid data storage.
Original title and link: BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases ( ©myNoSQL)