Archives of Support Questions (Read Only)

joel_carver · ‎08-25-2017

I have a sqoop import that works fine via the command line

~$ sqoop import --connect "jdbc:sqlserver://10.100.197.46:1433;database=rtoISONE" --username hadoop --password XXXXXX --hive-import --hive-database pe rl3 --hive-overwrite -m 1 --table MaxIndex

but when when I try to run it with a oozie workflow it never leaves the RUNNING phase and when I look at it in yarn it sits at 95%, I know that my oozie is set up correctly for one thing because when I run a shell script under it, it completes with out problem.

workflow.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="sqoop-wf">  
  <global/>  
  <start to="sqoop"/> 
  <action name="sqoop">  
    <sqoop xmlns="uri:oozie:sqoop-action:0.3">  
      <job-tracker>${resourceManager}</job-tracker>  
      <name-node>${nameNode}</name-node>
      <command>${command}</command>  
    </sqoop>  
    <ok to="end"/> 
    <error to="kill"/>
  </action>  
  <kill name="kill">  
    <message>${wf:errorMessage(wf:lastErrorNode())}</message>  
  </kill>  <end name="end"/>
</workflow-app>

job.properties

nameNode=hdfs://hadoopctrl:8020
resourceManager=hadoopctrl:8050
queueName=default
oozie.use.system.libpath=true
oozie.action.sharelib.for.sqoop=sqoop,hive,hcatalog
oozie.wf.application.path=${nameNode}/user/${user.name}
command=import --connect "jdbc:sqlserver://10.100.197.46:1433;database=rtoISONE" --username hadoop --password XXXXXX --hive-import --hive-database perl3 --hive-overwrite -m 1 --table MaxIndex

I have my vcores set to 10

I have tried adding different property to my workflow

<property> 
  <name>mapred.reduce.tasks</name>  
  <value>-1</value>  
</property>  
<property>  
  <name>mapreduce.job.reduces</name>  
  <value>1</value>  
</property>  
<property>  
  <name>mapreduce.job.queuname</name>  
  <value>launcher2</value>  
</property>  
<property>  
  <name>mapred.compress.map.output</name>  
  <value>true</value>  
</property>

Any ides any one has would be much appreciated

Thanks

joel_carver · ‎08-31-2017

Ok we have resolved our issues, it was a combination of three things; @antin leszczyszyn and @Artem Ervits put me on the right road, I will document how we solved the issues in the hopes that it helps someone else.

1. As Antin pointed out we had a user issue our group had installed apache ranger which changed the hadoop users and

permissions.

2. As Artem pointed out in the link to his tutorial we needed to create a lib folder in the folder that we are running our workflow from and add the jdbc.jar file and add the hive-site.xml and tez-site.xml .

3. When trying to trouble shoot this problem we had changed the scheduler to the fair version, we changed it back to

capacity scheduler and changed maximum-am-resource-percent=0.2 to 0.6

Thanks for the help

View solution in original post

joel_carver · ‎08-31-2017