Created on 01-04-2018 11:22 AM - edited 09-16-2022 05:42 AM
Hi!
I have the following workflow on Oozie with a Sqoop action that should import a mysql table into Hive:
workflow.xml (fails)
<workflow-app name="OOZIE_SQOOP_WF" xmlns="uri:oozie:workflow:0.4"> <start to="sqoop_action" /> <action name="sqoop_action"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <command>import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table categories --fields-terminated-by '\t' --hive-import --hive-table categories</command> </sqoop> <ok to="success"/> <error to="fail"/> </action> <kill name="fail"> <message>JOB FAILED!</message> </kill> <end name="success"/> </workflow-app>
job.properties
nameNode=hdfs://quickstart.cloudera:8020 jobTracker=localhost:8032 oozie.wf.application.path=${nameNode}/user/cloudera/oozie/spark-app oozie.use.system.libpath=true
This workflow fails always, but in the logs there is nothing about any exception or error. Also the same Sqoop command run from the shell works fine.
The following workflow instead completes with success (it does not import the table in Hive but only on HDFS):
workflow.xml
<workflow-app name="OOZIE_SQOOP_WF" xmlns="uri:oozie:workflow:0.4"> <start to="sqoop_action" /> <action name="sqoop_action"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/cloudera/categories"/> </prepare> <configuration> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> </configuration> <command>import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table categories --fields-terminated-by '\t' --warehouse-dir /user/cloudera</command> </sqoop> <ok to="success"/> <error to="fail"/> </action> <kill name="fail"> <message>JOB FAILED!</message> </kill> <end name="success"/> </workflow-app>
Does anyone know how to get the first workflow with Hive import to complete?
PS: I am using CDH quickstart 5.12.
Created on 01-05-2018 02:06 AM - edited 01-05-2018 02:08 AM
Solved with the following:
job.properties
nameNode=hdfs://quickstart.cloudera:8020 jobTracker=localhost:8032 oozie.wf.application.path=${nameNode}/user/cloudera/oozie/sqoop-app oozie.use.system.libpath=true oozie.action.sharelib.for.sqoop=hive,hcatalog,sqoop oozie.action.sharelib.for.hive=hive,hcatalog,sqoop
Copied hive-site.xml in the root folder of the application and added it in the workflow under <job-xml> tag:
workflow.xml
<workflow-app name="OOZIE_SQOOP_WF" xmlns="uri:oozie:workflow:0.4"> <start to="sqoop_action" /> <action name="sqoop_action"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/cloudera/categories"/> </prepare> <job-xml>hive-site.xml</job-xml> <command>import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table categories --fields-terminated-by ',' --hive-import --hive-table cloudera.categories</command> </sqoop> <ok to="success"/> <error to="fail"/> </action> <kill name="fail"> <message>JOB FAILED!</message> </kill> <end name="success"/> </workflow-app>
Created on 01-05-2018 02:06 AM - edited 01-05-2018 02:08 AM
Solved with the following:
job.properties
nameNode=hdfs://quickstart.cloudera:8020 jobTracker=localhost:8032 oozie.wf.application.path=${nameNode}/user/cloudera/oozie/sqoop-app oozie.use.system.libpath=true oozie.action.sharelib.for.sqoop=hive,hcatalog,sqoop oozie.action.sharelib.for.hive=hive,hcatalog,sqoop
Copied hive-site.xml in the root folder of the application and added it in the workflow under <job-xml> tag:
workflow.xml
<workflow-app name="OOZIE_SQOOP_WF" xmlns="uri:oozie:workflow:0.4"> <start to="sqoop_action" /> <action name="sqoop_action"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/cloudera/categories"/> </prepare> <job-xml>hive-site.xml</job-xml> <command>import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table categories --fields-terminated-by ',' --hive-import --hive-table cloudera.categories</command> </sqoop> <ok to="success"/> <error to="fail"/> </action> <kill name="fail"> <message>JOB FAILED!</message> </kill> <end name="success"/> </workflow-app>