Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

scheduling bash script using oozie

Highlighted

scheduling bash script using oozie

Explorer

Here’s the current issue I’m trying to solve and the two ways I’ve tried to go about it... 

Cluster is hosted in AWS, running CDH 5.4, no kerberos yet.

I have a bash script "ssh_local_to_hdfs.sh" that runs frine from the command line but I want to schedule it using Oozie.

 

I only need one of these ways to work, but I’d of course like to understand how to do both if it’s possible to.

 

First, I tried the script as SSH on the local node first, here's the XML:

<workflow-app name="Locus_Ingest_to_Staging_to_HDFS_-_SSH" xmlns="uri:oozie:workflow:0.5">
    <start to="ssh-e77d"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="ssh-e77d">
        <ssh xmlns="uri:oozie:ssh-action:0.1">
            <host>cloudera-scm@a0almcdhcan01</host>
            <command>/appl/ssh_local_to_hdfs.sh</command>
            <args>&quot;/opt/ingest/locus/&quot;</args>
            <args>&quot;/data/ingest/locus/&quot;</args>
            <args>&quot;vax;1&quot;</args>
            <args>&quot;vaxtoken&quot;</args>
            <capture-output/>
        </ssh>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

Here’s the error I get for this one, which makes me think that I just need to do something to allow Oozie to be authorized to run the script:

 

AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 cloudera-scm@a0almcdhcan01 mkdir -p oozie-oozi/0006683-150901124850542-oozie-oozi-W/ssh-e77d--ssh/ ] | ErrorStream: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

 

So I’ve tried putting the PEM key that I use to connect to my local machine in the job’s workspace, 

trying to do a preliminary step to ssh in referencing that file,

and by adding the -i option to the arguments to get it to look at the PEM file. 

 

This seems like it is really just a case of authentication. 

The only other thought that I have is if the PEM file only works for Cloudera-scm user and not for Oozie user.

 

 

So I also tried this as a Shell action from HDFS (placed it at /user/cloudera-scm/oozie/appl/), here's the XML:

<workflow-app name="Locus_Ingest_to_Staging_to_HDFS_-_Shell" xmlns="uri:oozie:workflow:0.5">
    <start to="shell-5cb7"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="shell-5cb7">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <exec>/user/cloudera-scm/oozie/appl/ssh_local_to_hdfs.sh</exec>
              <argument>&quot;/opt/ingest/locus/&quot;</argument>
              <argument>&quot;/data/ingest/locus/&quot;</argument>
              <argument>&quot;vax;1&quot;</argument>
              <argument>&quot;vaxtoken&quot;</argument>
              <capture-output/>
        </shell>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

Here’s the error for this case, which makes me wonder if I just need to change the location of the file or that I’m calling it the wrong way:

 

Cannot run program "ssh_local_to_hdfs.sh" (in directory "/mnt/ephemeral3/yarn/nm/usercache/cloudera-scm/appcache/application_1441126413269_230674/container_e51_1441126413269_230674_01_000002"): error=2, No such file or directory

I’ve tried putting the bash file in the job’s workspace and also in /user/oozie/ in HDFS but that doesn’t make a difference.

 

 

If you can help me to get either of these to work this is where I’m stuck.

I can run the script fine from the command line but from Oozie I’m stumped...

 

Thanks!

 

1 REPLY 1

Re: scheduling bash script using oozie

Explorer

Try adding shell script in distributed cache

 

<workflow-app name="Locus_Ingest_to_Staging_to_HDFS_-_Shell" xmlns="uri:oozie:workflow:0.5">
    <start to="shell-5cb7"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="shell-5cb7">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <exec>ssh_local_to_hdfs.sh</exec>  
<argument>&quot;/opt/ingest/locus/&quot;</argument> <argument>&quot;/data/ingest/locus/&quot;</argument> <argument>&quot;vax;1&quot;</argument> <argument>&quot;vaxtoken&quot;</argument> <file>hdfspath/ssh_local_to_hdfs.sh</file>
<capture-output/> </shell> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>