Sunday 18 November 2018

Oozie in Hadoop-2

Why use Oozie instead of just cascading a jobs one after another?




  • Major flexibility – Start, Stop, Suspend, and re-­‐run jobs

  • Oozie allows you to restart from a failure – You can tell Oozie to restart a job from a specific node in the graph or to skip specific failed nodes


Other Features

• Java Client API/ Command Line Interface

– Launch, control, and monitor jobs from your Java Apps

• Web Service API

 – You can control jobs from anywhere

• Run Periodic jobs

 – Have jobs that you need to run every hour, day, week? Have Oozie run the jobs for you

• Receive an email when a job is complete

Oozie in Hadoop-1

What is Oozie?

  • Oozie is a workflow scheduler for Hadoop
  • Originally, designed at Yahoo! for their complex search engine workflows.
  • Now it is an open-­‐source Apache incubator project.
  • Oozie allows a user to create Directed Acyclic Graphs of workflows and these can be ran in parallel and sequential in Hadoop.
  • Oozie can also run plain java classes, Pig workflows, and interact with the HDFS.

  • Oozie can run job’s sequentially (one after the other) and in parallel (multiple at a time).

Monday 9 April 2018