NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



A Detailed Guide to Oozie

Boris Lublinsky and Michael Segel series of articles about Oozie, the Hadoop workflow framework, published on InfoQ:

  • Introduction to Oozie

    Oozie workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic Graph), specifying a sequence of actions execution. This graph is specified in hPDL (a XML Process Definition Language).

  • Oozie by Example

    In this article we will describe a more complex Oozie example, which will allow us to discuss more Oozie features and demonstrate how to use them. The workflow which we are describing here implements vehicle GPS probe data ingestion. Probes data is delivered to a specific HDFS directory hourly in a form of file, containing all probes for this hour. Probes ingestion is done daily for all 24 files for this day. If the amount of files is 24, an ingestion process should start. […]

  • Extending Oozie

    In this article we will show how to leverage Oozie extensibility to implement custom orchestration language extensions.

These should be enough not only to give you on overview of Oozie but also a good start to using it when complex workflows are needed by your Hadoop MapReduce jobs.

Original title and link: A Detailed Guide to Oozie (NoSQL database©myNoSQL)