Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to execute sqoop import command in python script using oozie workflow?

Highlighted

How to execute sqoop import command in python script using oozie workflow?

New Contributor

I've tried executing sqoop import command in python script using oozie workflow. The job gets succeeded but the import is not done. Also any python's print command's o/p is executed but not a single sqoop commad.

Also note when the same python script is run individually it's executed succesfully and thesqooping is done.

I get the following errors as I execute:

vi /var/log/oozie/oozie-error.log

2017-05-08 17:08:04,090 WARN ParameterVerifier:523 - SERVER[hanhdpmstdev03.********.net.**] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] The application does not define formal parameters in its XML definition

2017-05-08 17:08:15,829 WARN ShellActionExecutor:523 - SERVER[hanhdpmstdev03.********.net.**] USER[-] GROUP[-] TOKEN[] APP[python-wf] JOB[0000071-170502161125406-oozie-oozi-W] ACTION[0000071-170502161125406-oozie-oozi-W@python-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

The following error is shown in the Oozie UI:

2017-05-08 17:08:15,906  INFO WorkflowNotificationXCommand:520 - SERVER[hanhdpmstdev03.********.net.***] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000071-170502161125406-oozie-oozi-W] ACTION[0000071-170502161125406-oozie-oozi-W@python-node] No Notification URL is defined. Therefore nothing to notify for job 0000071-170502161125406-oozie-oozi-W@python-node

Below I paste the codes:

workflow.xml, job.properties & compress.py

<workflow-app xmlns="uri:oozie:workflow:0.4" name="python-wf">
    <start to="python-node"/>
    <action name="python-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                <name>oozie.sqoop.defaults</name>
                <value>/user/oozie/apps/python/sqoop-site.xml</value>
                </property>


            </configuration>
            <exec>compress.py</exec>
            <file>scripts/compress.py</file>
            <capture-output/>
        </shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Python action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>


nameNode=hdfs://<hostname>8020
jobTracker=<hostname>:8050
queueName=default
examplesRoot=oozie
oozie.wf.application.path=${nameNode}/user/${examplesRoot}/apps/python

compress.py

import os
print("HI")
command = "sqoop import --connect jdbc:oracle:thin:@<host_ip>:1526:CSVDBS --driver oracle.jdbc.driver.OracleDriver --connection-manager org.apache.sqoop.manager.GenericJdbcManager --username XXXXXX --password ****** --query \"select A.SERVICE_ID from SERVICE_HISTORY_V A where 1 = 1 AND \$CONDITIONS\" --target-dir '/user/sqooppython123405' -m 1"
command1 = "hadoop jar /usr/hdp/2.5.0.0-1245/oozie/share/lib/mapreduce-streaming/hadoop-streaming-2.7.3.2.5.0.0-1245.jar \
  -Dmapred.output.compress=true \
  -Dmapred.compress.map.output=true \
  -Dmapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec \
  -Dmapred.reduce.tasks=1 \
  -input /user/sqooppython \
  -output /user/sqooppython_BZIP2"
os.system(command)
print("Scoop Done")
os.system(command1)
print("bye")
Don't have an account?
Coming from Hortonworks? Activate your account here