Created on 04-16-2017 10:03 AM - edited 09-16-2022 04:28 AM
I have a shell script in HDFS. I have scheduled this script in oozie with the following workflow.
Workflow:
<workflow-app name="Shell_test" xmlns="uri:oozie:workflow:0.5"> <start to="shell-8f63"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="shell-8f63"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>shell.sh</exec> <argument>${input_file}</argument> <env-var>HADOOP_USER_NAME=${wf:user()}</env-var> <file>/user/xxxx/shell_script/lib/shell.sh#shell.sh</file> <file>/user/xxxx/args/${input_file}#${input_file}</file> </shell> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
job properties
nameNode=xxxxxxxxxxxxxxxxxxxx jobTracker=xxxxxxxxxxxxxxxxxxxxxxxx queueName=default oozie.use.system.libpath=true oozie.wf.application.path=${nameNode}/user/${user.name}/xxxxxxx/xxxxxx
args file
tableA tableB tablec tableD
Now the shell script runs for single table in args file. This workflow is executed successfully without any errors.
How can I schedule this shell script to run in parallel. I want the script to run for 10 table at the same time.
What are the steps needed to do so. What changes should I make to the workflow.
Should I created 10 workflow for running 10 parallel jobs. Or what are the best scenarios to deal with this issue.
Created 04-18-2017 09:18 AM
@HillBilly I have checked the blog you posted. It says fork and joins must be used together.
But here in my script I will be creating new tables from existing tables. I don't have anything to join. So looks like I should not use fork and join in my Workflow
Created 04-25-2017 11:43 AM
@mbigelow Did you find a chance to have a look at my scripts and my problem.
Looks like I am stuck on this, Cannot figure out any solution but only option is to use cron jobs which I don't want to