Reply
Explorer
Posts: 33
Registered: ‎01-30-2017

How to run shell script in oozie

I have a shell script in hdfs. I would like to schedule this script in oozie. This script take inputs from another source file.

For this sqoop_import.sh script I will pass the table names as arguments. There are 1500 tables So I want to execute them in parallel.

`Tables`

123_abc
234_cde
enf_7yui
and so on


so, Using linux command split I split the file which contains the 1500 table names into 20 small files. So I can execute 20 jobs in parallel.

The scripts are as follows:

`Sqoop_import.sh`

#!/bin/bash
#This script is to import tables from mysql to hdfs

source /home/$USER/mysql/source.sh

[ $# -ne 1 ] && { echo "Usage : $0 table ";exit 1; }

table=$1

TIMESTAMP=`date "+%Y-%m-%d"`
touch /home/$USER/logs/${TIMESTAMP}.success_log
touch /home/$USER/logs/${TIMESTAMP}.fail_log
success_logs=/home/$USER/logs/${TIMESTAMP}.success_log
failed_logs=/home/$USER/logs/${TIMESTAMP}.fail_log

#Function to get the status of the job creation
function log_status
{
status=$1
message=$2
if [ "$status" -ne 0 ]; then
echo "`date +\"%Y-%m-%d %H:%M:%S\"` [ERROR] $message [Status] $status : failed" | tee -a "${failed_logs}"
#echo "Please find the attached log file for more details"
exit 1
else
echo "`date +\"%Y-%m-%d %H:%M:%S\"` [INFO] $message [Status] $status : success" | tee -a "${success_logs}"
fi
}

 

sqoop import -D mapreduce.map.memory.mb=3584 -D mapreduce.map.java.opts=-Xmx2868m --connect ${domain}:${port}/${database} --username ${username} --password ${password} --query "select * from ${table} where \$CONDITIONS" -m 1 --as-parquetfile --hive-import --hive-database ${hivedatabase} --hive-table ${table} --map-column-java Date=String --target-dir /user/hive/warehouse/${hivedatabase}.db/${table} --outdir /home/$USER/logs/outdir

 

g_STATUS=$?
log_status $g_STATUS "SQOOP import ${table}"

echo "*********************************************************************************************************************************************************************************"


Here is the `source.sh` file

domain=jdbc:mysql://XXXXXXXXX
port=3306
database=testing
username=xxxxxx
password=xxxxxxx
hivedatabase=testing


Now I want schedule this job in oozie. I gave the shell script path in workflow.xml and with all job properties.

I am confused how I can pass 20 arguments to the same script for the workflow to execute in parrallel.

Moreover I want the soucre.sh file contents to be passed to the shell script.

This script works fine in linux cron jobs. I just want to use oozie from now on.

I would appreciate a explanation for the answers.

Cloudera Employee
Posts: 18
Registered: ‎06-17-2016

Re: How to run shell script in oozie

You'd be better off to create a Sqoop action and generate a workflow.xml that has several Sqoop actions with paralell executions (using fork and join nodes) inside an Oozie workflow.

 

Announcements