Created 08-25-2017 07:54 PM
I have a sqoop import that works fine via the command line
~$ sqoop import --connect "jdbc:sqlserver://10.100.197.46:1433;database=rtoISONE" --username hadoop --password XXXXXX --hive-import --hive-database pe rl3 --hive-overwrite -m 1 --table MaxIndex
but when when I try to run it with a oozie workflow it never leaves the RUNNING phase and when I look at it in yarn it sits at 95%, I know that my oozie is set up correctly for one thing because when I run a shell script under it, it completes with out problem.
workflow.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <workflow-app xmlns="uri:oozie:workflow:0.5" name="sqoop-wf"> <global/> <start to="sqoop"/> <action name="sqoop"> <sqoop xmlns="uri:oozie:sqoop-action:0.3"> <job-tracker>${resourceManager}</job-tracker> <name-node>${nameNode}</name-node> <command>${command}</command> </sqoop> <ok to="end"/> <error to="kill"/> </action> <kill name="kill"> <message>${wf:errorMessage(wf:lastErrorNode())}</message> </kill> <end name="end"/> </workflow-app>
job.properties
nameNode=hdfs://hadoopctrl:8020 resourceManager=hadoopctrl:8050 queueName=default oozie.use.system.libpath=true oozie.action.sharelib.for.sqoop=sqoop,hive,hcatalog oozie.wf.application.path=${nameNode}/user/${user.name} command=import --connect "jdbc:sqlserver://10.100.197.46:1433;database=rtoISONE" --username hadoop --password XXXXXX --hive-import --hive-database perl3 --hive-overwrite -m 1 --table MaxIndex
I have my vcores set to 10
I have tried adding different property to my workflow
<property> <name>mapred.reduce.tasks</name> <value>-1</value> </property> <property> <name>mapreduce.job.reduces</name> <value>1</value> </property> <property> <name>mapreduce.job.queuname</name> <value>launcher2</value> </property> <property> <name>mapred.compress.map.output</name> <value>true</value> </property>
Any ides any one has would be much appreciated
Thanks
Created 08-31-2017 12:25 PM
Ok we have resolved our issues, it was a combination of three things; @antin leszczyszyn and @Artem Ervits put me on the right road, I will document how we solved the issues in the hopes that it helps someone else.
1. As Antin pointed out we had a user issue our group had installed apache ranger which changed the hadoop users and
permissions.
2. As Artem pointed out in the link to his tutorial we needed to create a lib folder in the folder that we are running our workflow from and add the jdbc.jar file and add the hive-site.xml and tez-site.xml .
3. When trying to trouble shoot this problem we had changed the scheduler to the fair version, we changed it back to
capacity scheduler and changed maximum-am-resource-percent=0.2 to 0.6
Thanks for the help
Created 08-28-2017 05:42 PM
What do the yarn logs say? Could you post them?
Created 08-28-2017 07:23 PM
Thanks for taking the time to respond, (little bit of a hadoop noob )
I produced this log by running the command
~$ yarn job -list
taking the ID I found there and running this command
~$ yarn logs -applicationId application_1503692663931_0004 > wflog.log
The file that was produced was around 5000 lines this seems excessive to me, if there is a better way please let me know. I attached the log file. I trimmed the LogType:directory.info to get the file size down.
Thnaks
Created 08-28-2017 08:15 PM
From the log it seems that your sqoop job gets stuck with heart beat, heart beat... loop.
This is a common result/problem if something has gone wrong. Do search 'oozie sqoop import heart beat'.
But I believe it is potentially a permissions issue, as it has got through 95%.
I suspect that when you run the sqoop job manually you run as 'hdfs' user. Can you confirm this?
USER="hdfs"
and
realUser=oozie
Is mentioned in the logs. I suspect the 'oozie' user does not have permission to overwrite the table.
Check to permission of the table. Maybe change permission or ownership for diagnosis, and try again.
Created 08-29-2017 08:46 AM
How are you getting on? Any luck with this?
Created 08-29-2017 03:42 PM
No luck yet.
You are correct that I am running the sqoop import form the command line as the hdfs user.
Thanks for pointing out the realUser=oozie setting in the logs when I searched the logs I always looked for user and it always said hdfs
The first thing I tried was:
hadoop fs -chmod -R 777 /apps/hive/warehouse/perl3.db
I also tried adding these properties in my work flow to force oozie to exe as hdfs user, hadoopUser was set in the job file as hdfs
<property>
<name>HADOOP_USER_NAME</name>
<value>${hadoopUser}</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>hadoopctrl,hadoopm1,hadoopm2,hadoopm3</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>hdfs</value>
</property>
same result with the job in yarn being stuck at 95%
Created 08-29-2017 09:07 PM
By doing hadoop fs -chmod -R 777 on your hive table, we can probably eliminate permission issues.
This is a great puzzle.
It should have been raised in the logs, but anything strange about your data? Nulls, NAs, Empty? strange date formats, decimals, special characters?
Anything in @Artem Ervits post that helped?
Created 08-29-2017 04:33 PM
please review my tutorial for the caveats with setting up sqoop action. For example, starting with HDP 2.4 or 2.5, I forget, you need tez-site.xml in your lib directory. https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-...
Created 08-29-2017 08:56 PM
Thanks for pointing to your article it had some good info in it, but still no luck for me. After I uploaded the files and add them to the workflow.xml the only difference I see is I am using sqlserver jdbc driver.
Created 08-30-2017 04:04 PM
Also from your log and post hadoopctrl is namenode, resourcemanager, oozie. Is it data node and node manager also? It may be in a bottle neck with memory. Oozie trying to use the memory but yarn can not allocate memory or write the data. Potentially try moving your oozie serve to another node or reduce or redistribute to memory allocation, oozie usually doesn't need too much. This will probably explain the heart beat issue.