Support Questions

Find answers, ask questions, and share your expertise

oozie sqoop action hangs at 95%

avatar
Explorer

I have a sqoop import that works fine via the command line

~$ sqoop import --connect "jdbc:sqlserver://10.100.197.46:1433;database=rtoISONE" --username hadoop --password XXXXXX --hive-import --hive-database pe rl3 --hive-overwrite -m 1 --table MaxIndex

but when when I try to run it with a oozie workflow it never leaves the RUNNING phase and when I look at it in yarn it sits at 95%, I know that my oozie is set up correctly for one thing because when I run a shell script under it, it completes with out problem.

workflow.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="sqoop-wf">  
  <global/>  
  <start to="sqoop"/> 
  <action name="sqoop">  
    <sqoop xmlns="uri:oozie:sqoop-action:0.3">  
      <job-tracker>${resourceManager}</job-tracker>  
      <name-node>${nameNode}</name-node>
      <command>${command}</command>  
    </sqoop>  
    <ok to="end"/> 
    <error to="kill"/>
  </action>  
  <kill name="kill">  
    <message>${wf:errorMessage(wf:lastErrorNode())}</message>  
  </kill>  <end name="end"/>
</workflow-app>

job.properties

nameNode=hdfs://hadoopctrl:8020
resourceManager=hadoopctrl:8050
queueName=default
oozie.use.system.libpath=true
oozie.action.sharelib.for.sqoop=sqoop,hive,hcatalog
oozie.wf.application.path=${nameNode}/user/${user.name}
command=import --connect "jdbc:sqlserver://10.100.197.46:1433;database=rtoISONE" --username hadoop --password XXXXXX --hive-import --hive-database perl3 --hive-overwrite -m 1 --table MaxIndex

I have my vcores set to 10

I have tried adding different property to my workflow

<property> 
  <name>mapred.reduce.tasks</name>  
  <value>-1</value>  
</property>  
<property>  
  <name>mapreduce.job.reduces</name>  
  <value>1</value>  
</property>  
<property>  
  <name>mapreduce.job.queuname</name>  
  <value>launcher2</value>  
</property>  
<property>  
  <name>mapred.compress.map.output</name>  
  <value>true</value>  
</property>

Any ides any one has would be much appreciated

Thanks

	
1 ACCEPTED SOLUTION

avatar
Explorer

Ok we have resolved our issues, it was a combination of three things; @antin leszczyszyn and @Artem Ervits put me on the right road, I will document how we solved the issues in the hopes that it helps someone else.

1. As Antin pointed out we had a user issue our group had installed apache ranger which changed the hadoop users and

permissions.

2. As Artem pointed out in the link to his tutorial we needed to create a lib folder in the folder that we are running our workflow from and add the jdbc.jar file and add the hive-site.xml and tez-site.xml .

3. When trying to trouble shoot this problem we had changed the scheduler to the fair version, we changed it back to

capacity scheduler and changed maximum-am-resource-percent=0.2 to 0.6

Thanks for the help

View solution in original post

11 REPLIES 11

avatar
Contributor

What do the yarn logs say? Could you post them?

avatar
Explorer

Thanks for taking the time to respond, (little bit of a hadoop noob )

I produced this log by running the command

~$ yarn job -list

taking the ID I found there and running this command

~$ yarn logs -applicationId application_1503692663931_0004 > wflog.log

The file that was produced was around 5000 lines this seems excessive to me, if there is a better way please let me know. I attached the log file. I trimmed the LogType:directory.info to get the file size down.

Thnaks

avatar
Contributor

From the log it seems that your sqoop job gets stuck with heart beat, heart beat... loop.

This is a common result/problem if something has gone wrong. Do search 'oozie sqoop import heart beat'.

But I believe it is potentially a permissions issue, as it has got through 95%.

I suspect that when you run the sqoop job manually you run as 'hdfs' user. Can you confirm this?

USER="hdfs"

and

realUser=oozie

Is mentioned in the logs. I suspect the 'oozie' user does not have permission to overwrite the table.

Check to permission of the table. Maybe change permission or ownership for diagnosis, and try again.

avatar
Contributor

How are you getting on? Any luck with this?

avatar
Explorer

No luck yet.
You are correct that I am running the sqoop import form the command line as the hdfs user.
Thanks for pointing out the realUser=oozie setting in the logs when I searched the logs I always looked for user and it always said hdfs

The first thing I tried was:

hadoop fs -chmod -R 777 /apps/hive/warehouse/perl3.db

I also tried adding these properties in my work flow to force oozie to exe as hdfs user, hadoopUser was set in the job file as hdfs

<property>
<name>HADOOP_USER_NAME</name>
<value>${hadoopUser}</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>hadoopctrl,hadoopm1,hadoopm2,hadoopm3</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>hdfs</value>
</property>

same result with the job in yarn being stuck at 95%

avatar
Contributor

By doing hadoop fs -chmod -R 777 on your hive table, we can probably eliminate permission issues.

This is a great puzzle.

It should have been raised in the logs, but anything strange about your data? Nulls, NAs, Empty? strange date formats, decimals, special characters?

Anything in @Artem Ervits post that helped?

avatar
Master Mentor
@Joel Carver

please review my tutorial for the caveats with setting up sqoop action. For example, starting with HDP 2.4 or 2.5, I forget, you need tez-site.xml in your lib directory. https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-...

avatar
Explorer

@Artem Ervits

Thanks for pointing to your article it had some good info in it, but still no luck for me. After I uploaded the files and add them to the workflow.xml the only difference I see is I am using sqlserver jdbc driver.

avatar
Contributor

Also from your log and post hadoopctrl is namenode, resourcemanager, oozie. Is it data node and node manager also? It may be in a bottle neck with memory. Oozie trying to use the memory but yarn can not allocate memory or write the data. Potentially try moving your oozie serve to another node or reduce or redistribute to memory allocation, oozie usually doesn't need too much. This will probably explain the heart beat issue.