Created on 04-16-2017 10:03 AM - edited 09-16-2022 04:28 AM
I have a shell script in HDFS. I have scheduled this script in oozie with the following workflow.
Workflow:
<workflow-app name="Shell_test" xmlns="uri:oozie:workflow:0.5">
<start to="shell-8f63"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-8f63">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>shell.sh</exec>
<argument>${input_file}</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>/user/xxxx/shell_script/lib/shell.sh#shell.sh</file>
<file>/user/xxxx/args/${input_file}#${input_file}</file>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>job properties
nameNode=xxxxxxxxxxxxxxxxxxxx
jobTracker=xxxxxxxxxxxxxxxxxxxxxxxx
queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/xxxxxxx/xxxxxx
args file
tableA tableB tablec tableD
Now the shell script runs for single table in args file. This workflow is executed successfully without any errors.
How can I schedule this shell script to run in parallel. I want the script to run for 10 table at the same time.
What are the steps needed to do so. What changes should I make to the workflow.
Should I created 10 workflow for running 10 parallel jobs. Or what are the best scenarios to deal with this issue.
Created 04-18-2017 09:18 AM
@HillBilly I have checked the blog you posted. It says fork and joins must be used together.
But here in my script I will be creating new tables from existing tables. I don't have anything to join. So looks like I should not use fork and join in my Workflow
Created 04-25-2017 11:43 AM
@mbigelow Did you find a chance to have a look at my scripts and my problem.
Looks like I am stuck on this, Cannot figure out any solution but only option is to use cron jobs which I don't want to