The Workflow portion of the exam has the following expectations:
The ability to create and execute various jobs and actions that move data towards greater value and use in a system.
This includes the following skills:
- Create and execute a linear workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom actions, etc.
- Create and execute a branching workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom action, etc.
- Orchestrate a workflow to execute regularly at predefined times, including workflows that have data dependencies.
Would it be acceptable if we use a combination of bash scripts and cronjobs for this portion?