Boris Lublinsky and Michael Segel series of articles about Oozie, the Hadoop workflow framework, published on InfoQ:
Oozie workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic Graph), specifying a sequence of actions execution. This graph is specified in hPDL (a XML Process Definition Language).
In this article we will describe a more complex Oozie example, which will allow us to discuss more Oozie features and demonstrate how to use them. The workflow which we are describing here implements vehicle GPS probe data ingestion. Probes data is delivered to a specific HDFS directory hourly in a form of file, containing all probes for this hour. Probes ingestion is done daily for all 24 files for this day. If the amount of files is 24, an ingestion process should start. […]
In this article we will show how to leverage Oozie extensibility to implement custom orchestration language extensions.
These should be enough not only to give you on overview of Oozie but also a good start to using it when complex workflows are needed by your Hadoop MapReduce jobs.
Original title and link: A Detailed Guide to Oozie ( ©myNoSQL)