Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to run shell script in oozie

How to run shell script in oozie

Explorer

I have a shell script in hdfs. I would like to schedule this script in oozie. This script take inputs from another source file.

For this sqoop_import.sh script I will pass the table names as arguments. There are 1500 tables So I want to execute them in parallel.

`Tables`

123_abc
234_cde
enf_7yui
and so on


so, Using linux command split I split the file which contains the 1500 table names into 20 small files. So I can execute 20 jobs in parallel.

The scripts are as follows:

`Sqoop_import.sh`

#!/bin/bash
#This script is to import tables from mysql to hdfs

source /home/$USER/mysql/source.sh

[ $# -ne 1 ] && { echo "Usage : $0 table ";exit 1; }

table=$1

TIMESTAMP=`date "+%Y-%m-%d"`
touch /home/$USER/logs/${TIMESTAMP}.success_log
touch /home/$USER/logs/${TIMESTAMP}.fail_log
success_logs=/home/$USER/logs/${TIMESTAMP}.success_log
failed_logs=/home/$USER/logs/${TIMESTAMP}.fail_log

#Function to get the status of the job creation
function log_status
{
status=$1
message=$2
if [ "$status" -ne 0 ]; then
echo "`date +\"%Y-%m-%d %H:%M:%S\"` [ERROR] $message [Status] $status : failed" | tee -a "${failed_logs}"
#echo "Please find the attached log file for more details"
exit 1
else
echo "`date +\"%Y-%m-%d %H:%M:%S\"` [INFO] $message [Status] $status : success" | tee -a "${success_logs}"
fi
}

 

sqoop import -D mapreduce.map.memory.mb=3584 -D mapreduce.map.java.opts=-Xmx2868m --connect ${domain}:${port}/${database} --username ${username} --password ${password} --query "select * from ${table} where \$CONDITIONS" -m 1 --as-parquetfile --hive-import --hive-database ${hivedatabase} --hive-table ${table} --map-column-java Date=String --target-dir /user/hive/warehouse/${hivedatabase}.db/${table} --outdir /home/$USER/logs/outdir

 

g_STATUS=$?
log_status $g_STATUS "SQOOP import ${table}"

echo "*********************************************************************************************************************************************************************************"


Here is the `source.sh` file

domain=jdbc:mysql://XXXXXXXXX
port=3306
database=testing
username=xxxxxx
password=xxxxxxx
hivedatabase=testing


Now I want schedule this job in oozie. I gave the shell script path in workflow.xml and with all job properties.

I am confused how I can pass 20 arguments to the same script for the workflow to execute in parrallel.

Moreover I want the soucre.sh file contents to be passed to the shell script.

This script works fine in linux cron jobs. I just want to use oozie from now on.

I would appreciate a explanation for the answers.

1 REPLY 1

Re: How to run shell script in oozie

Cloudera Employee

You'd be better off to create a Sqoop action and generate a workflow.xml that has several Sqoop actions with paralell executions (using fork and join nodes) inside an Oozie workflow.