Support Questions

Find answers, ask questions, and share your expertise

Schedule shell script to run parallelly in oozie

avatar
Contributor

I have a shell script in HDFS. I have scheduled this script in oozie with the following workflow.

 

Workflow:

 

<workflow-app name="Shell_test" xmlns="uri:oozie:workflow:0.5">
<start to="shell-8f63"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-8f63">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>shell.sh</exec>
<argument>${input_file}</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>/user/xxxx/shell_script/lib/shell.sh#shell.sh</file>
<file>/user/xxxx/args/${input_file}#${input_file}</file>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>

job properties

 

nameNode=xxxxxxxxxxxxxxxxxxxx
jobTracker=xxxxxxxxxxxxxxxxxxxxxxxx
queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/xxxxxxx/xxxxxx


args file

 

tableA
tableB
tablec
tableD

Now the shell script runs for single table in args file. This workflow is executed successfully without any errors.

 

How can I schedule this shell script to run in parallel. I want the script to run for 10 table at the same time.

 

What are the steps needed to do so. What changes should I make to the workflow.

 

Should I created 10 workflow for running 10 parallel jobs. Or what are the best scenarios to deal with this issue.

11 REPLIES 11

avatar
Contributor

@HillBilly I have checked the blog you posted. It says fork and joins must be used together.

 

But here in my script I will be creating new tables from existing tables. I don't have anything to join. So looks like I should not use fork and join in my Workflow

avatar
Contributor

@mbigelow  Did you find a chance to have a look at my scripts and my problem. 

 

Looks like I am stuck on this, Cannot figure out any solution but only option is to use cron jobs which I don't want to