Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie job should the shell script from local (third party ETL has been install in one server).

Oozie job should the shell script from local (third party ETL has been install in one server).

Explorer

Hello,

 

I need a quick help on Oozie.

Here is my requirement.
 
I have a shellscript which is present in local server and inside the shell script I have few jobs (basically ETL jobs which will run in that server).
 
Master shellscript which is present in HDFS and from there I am trying to run (invoke) shellscripts and some ETL jobs from local (not present in HDFS).
 
Oozie workflow: user/user1/bhagaban
 
HDFS PATH:
 
user/user1/bhagaban
shell1.sh
[
echo "Welcome"
sh /home/bia/abc.sh  <-- invoking the shell script which will run from local.
exm /run /home/bia/adm1/ETL1.dtx <- thrid party job and it should run from the local server. [exm is the common to start the ETL job from local]
]
 
Basically I want to run the ETL1.dtx job (third party job from local) via oozie, because I need to schedule accordingly.
 
Now how should I shedule the shell script thru oozie, because the oozie should pick the shell script from APP-PATH (HDFS Path). but all the logic/external script should run from local path.
 
I have tried SSH action as well, but I dont know how to provide password while calling the SSH action node. Client is not agree for password less SSH.
 
Please advice me if any otherways are possible.
 
Your help would be greatly appriciated.
4 REPLIES 4

Re: Oozie job should the shell script from local (third party ETL has been install in one server).

Expert Contributor

Hey,

 

SSH actions do not support providing a password, you have to setup ssh keys for passwordless ssh to be able to use the SSH action.  As for the shell action, the way it works is it runs as an MR job, so it can randomly run on any NodeManager or TaskTracker node.  So you have 2 options for these:

 

sh /home/bia/abc.sh  <-- invoking the shell script which will run from local.
exm /run /home/bia/adm1/ETL1.dtx <- thrid party job and it should run from the local server. [exm is the common to start the ETL job from local]
 
1.  Make sure that both of those scripts and the "exm" command exist on every NodeManager or TaskTracker so that they can be found.
2.  Put everything in HDFS and then add:
 
<file>/path/to/abc.sh#abc.sh</file>
 
For everything that you call to the workflow.xml and then they get transferred to the distributed cache to the node they are going to run on.  Then you can reference them as "./abc.sh" or "./ETL1.dtx".
 
Hope this helps.
 
Thanks
Chris
Highlighted

Re: Oozie job should the shell script from local (third party ETL has been install in one server).

Explorer

Hello,

 

Thanks for your response.

 

I have tried your option 2. Because the the ETL has been installed all the nodes.

1- move the main script to HDFSwhere all the ETL funactions are present.

2- move the ETL scripts to HDFS

3- Even i have given the 2 file path like this;

<file>/path/to/abc.sh#abc.sh</file> <- main shell script

<file>/path/to/abc.sh#abc.sh</file> < -ETL job

 

Main Shell Script:

echo "BHAGABAN"
echo `date`
exm /run Source3_2727.8917.dtl  [this is the way to run the ETL job]

 

stderr logs

./check.sh: line 6: exm: command not found
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

Please advice

Re: Oozie job should the shell script from local (third party ETL has been install in one server).

Explorer

Hello,

 

I was able to run the shellscript from local via oozie with shell action node. I have 2 queries which I am trying to find out.

 

1- I have NODE1 (namenode) and NODE2, I have cretaed a shell script and put into HDFS and from that script I mentioned: sh /home/bia/dxm/action.sh  <-- this is in local. But I am able to run this script NO SSH required here. But when I am trying to access the NODE2 local directory it's saying PATH not found. Can somebody advice me why isit so. Because later stage we need to move this project to 20 node clusters, so I should be confident how it works.

 

2- When the oozie running the local shellscript, it writing all the logs from mapred user, but we need to do via our own user like dxmbig. Even I have tried the below line in the in the master shellscript which is present in HDFS but it did not change it.

 

sudo -u dxmbig sh /home/bia/dxm/action.sh > /home/bia/log/bd1.log 2>&

 

Please help me and suggest me further.

Re: Oozie job should the shell script from local (third party ETL has been install in one server).

Explorer
Hey Chris,

Step1 tried:
We have set-up password less user between cluster and when I am trying the SSH action node it's showing me AUTH FAILD.

Step2 tried:
Due to passwordless SSH we are using now for our cluster, I have used a shell action and run the script form datanode1 and tried the SSH password less unix command to trigger a script in diff node. but it failed with permission denied.

Requesting you please help me here.