Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

VERY MUCH URGENT ---OOZIE - SQOOP import SAVED JOB workflow Hangs

VERY MUCH URGENT ---OOZIE - SQOOP import SAVED JOB workflow Hangs

Contributor

Hi,

 
Iam using Virtual machine RAM 5GB, cpu cores : 4 ,CentOs 6.6, CM-5.4.0, CDH5.4.0, using MR2 yarn framework.
 
We are doing the POC and oozie sqoop import hangs, please advice the fix if you can , why this is hanged, 
 
The issue persisted since 1 week and unable to move ahead, its getting delay and unable to give clearance to client to use CDH hadoop or not
I have tried cdh users from google groups also but no luck, now i decided to post here
 
Job Status Always running and hanged
 
 
Node Manager :
yarn.nodemanager.resource.memory-mb - 3100 MB
yarn.nodemanager.resource.cpu-vcores - 4 
 
Resource Manager :
yarn.scheduler.maximum-allocation-mb - 2560 MB
yarn.scheduler.maximum-allocation-vcores-4
 
i have a linux user : cdhadmin,
hdfs user location ==> /user/cdhadmin
 
 
job.properties

nameNode=hdfs://hadoop-cdh.xxx.com:8020
jobTracker=hadoop-cdh.xxx.com:8032
#queueName=default
queueName=oozieQ
#queueName=oozieQ.oozieQ_sub

examplesRoot=oozie-wf
examplesRootDir=/user/${user.name}/${examplesRoot}

optionFile=options.par
oozieImportPath=${nameNode}/user/${user.name}/${examplesRoot}/oozie-import
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}

 
 
sqoop job creation

sqoop job --create mergetest -- import \
--connect jdbc:mysql://10.109.51.84/myapp \
--table mergetest --incremental lastmodified \
--check-column cdate \
--username root \
--password root1pass \
--m 1
 
without workflow(direct running)
sqoop job --meta-connect jdbc:hsqldb:hsql://10.109.51.24:16000/sqoop --exec mergetest -- --target-dir IncLineData

workflow.xml

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.2" name="cs-wf-fork-join">
<start to="sqoop-node" />
<action name="sqoop-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${oozieImportPath}"/>
</prepare>
<configuration>
<property>
<name>mapreduce.job.queuename</name>
<value>${queueName}</value>
</property>
</configuration>
<arg>job</arg>
<arg>--meta-connect</arg>
<arg>jdbc:hsqldb:hsql://10.109.51.24:16000/sqoop</arg>
<arg>--exec</arg>
<arg>mergetest</arg>
<arg>--</arg>
<arg>--target-dir</arg>
<arg>${oozieImportPath}</arg>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>


core-site.xml - service-wide configuration

<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>


Capacity Scheduler

<?xml version="1.0"?>
<configuration>

<!-- Root level -->

<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>oozieQ,default</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.acl_administer_queues</name>
<value>*</value>
</property>
<property>
<name>yarn.scheduler.capacity.node-locality-delay</name>
<value>40</value>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>15000</value>
</property>


<!-- Parent oozieQ -->

<property>
<name>yarn.scheduler.capacity.oozieQ.queues</name>
<value>oozieQ_sub</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.oozieQ.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.oozieQ.state</name>
<value>RUNNING</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.oozieQ.minimum-user-limit-percent</name>
<value>20</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.oozieQ.user-limit-factor</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.oozieQ.maximum-capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.oozieQ.acl_administer_queue</name>
<value>cdhadmin</value>
</property>
<property>
<name>yarn.scheduler.capacity.oozieQ.maximum-applications</name>
<value>50</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.oozieQ.acl_submit_applications</name>
<value>cdhadmin cdhadmin</value>
</property>
<property>
<name>yarn.scheduler.capacity.oozieQ.maximum-am-resource-percent</name>
<value>0.5</value>
</property>

<!-- Parent default -->

<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>0</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.state</name>
<value>RUNNING</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.minimum-user-limit-percent</name>
<value>10</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>80</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.acl_submit_applications</name>
<value>hadoop,yarn,mapred,hdfs</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>cdhadmin cdhadmin</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>cdhadmin</value>
</property>
<property>
<name>yarn.scheduler.capacity.default.maximum-applications</name>
<value>20</value>
</property>
<property>
<name>yarn.scheduler.capacity.default.maximum-am-resource-percent</name>
<value>0.1</value>
</property>

</configuration>

[cdhadmin@hadoop-cdh oozie]$ oozie job -config /home/cdhadmin/oozie/job.properties -run 
Job ID : 0000000-150526103336035-oozie-oozi-W

[cdhadmin@hadoop-cdh oozie]$ oozie job -info 0000000-150526103336035-oozie-oozi-W Job ID : 0000000-150526103336035-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : cs-wf-fork-join
App Path : hdfs://hadoop-cdh.xxx.com:8020/user/cdhadmin/oozie-wf
Status : RUNNING
Run : 0
User : cdhadmin
Group : -
Created : 2015-05-26 05:05 GMT
Started : 2015-05-26 05:05 GMT
Last Modified : 2015-05-26 05:38 GMT
Ended : -
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000000-150526103336035-oozie-oozi-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000000-150526103336035-oozie-oozi-W@sqoop-node RUNNING job_1432616428665_0001 RUNNING -
------------------------------------------------------------------------------------------------------------------------------------



 
 
 
3 REPLIES 3

Re: VERY MUCH URGENT ---OOZIE - SQOOP import SAVED JOB workflow Hangs

Contributor

New Job today submitted and the log snippet

 

 Please help as soon as possible

 

 

Container syslog

 

nt handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1432710421974_0001_m_000000 Task Transitioned from NEW to SCHEDULED
2015-05-27 12:38:43,748 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1432710421974_0001_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2015-05-27 12:38:43,748 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.local.LocalContainerAllocator: Processing the event EventType: CONTAINER_REQ
2015-05-27 12:38:43,803 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1432710421974_0001, File: hdfs://hadoop-cdh.xxx.com:8020/user/cdhadmin/.staging/job_1432710421974_0001/job_1432710421974_0001_1.jhist
2015-05-27 12:38:43,904 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved hadoop-cdh.xxx.com to /default
2015-05-27 12:38:43,905 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Job jar is not present. Not adding any jar to the list of resources.
2015-05-27 12:38:43,927 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /user/cdhadmin/.staging/job_1432710421974_0001/job.xml
2015-05-27 12:38:44,184 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #1 tokens and #1 secret keys for NM use for launching container
2015-05-27 12:38:44,184 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of containertokens_dob is 2
2015-05-27 12:38:44,184 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle token in serviceData
2015-05-27 12:38:44,297 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://hadoop-cdh.xxx.com:8020]
2015-05-27 12:38:44,808 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1432710421974_0001_m_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2015-05-27 12:38:44,812 INFO [uber-EventHandler] org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1432710421974_0001_01_000001 taskAttempt attempt_1432710421974_0001_m_000000_0
2015-05-27 12:38:44,819 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1432710421974_0001_m_000000_0] using containerId: [container_1432710421974_0001_01_000001 on NM: [hadoop-cdh.xxx.com:8041]
2015-05-27 12:38:44,826 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: mapreduce.cluster.local.dir for uber task: /yarn/nm/usercache/cdhadmin/appcache/application_1432710421974_0001
2015-05-27 12:38:44,835 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1432710421974_0001_m_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2015-05-27 12:38:44,836 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1432710421974_0001_m_000000 Task Transitioned from SCHEDULED to RUNNING
2015-05-27 12:38:44,841 INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2015-05-27 12:38:44,882 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2015-05-27 12:38:45,012 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.oozie.action.hadoop.OozieLauncherInputFormat$EmptySplit@19912a48
2015-05-27 12:38:45,027 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2015-05-27 12:38:45,066 INFO [uber-SubtaskRunner] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id


container stdout

 

ge.compress=false
mapreduce.shuffle.ssl.enabled=false
yarn.log-aggregation-enable=false
mapreduce.tasktracker.report.address=127.0.0.1:0
mapreduce.tasktracker.http.threads=40
dfs.stream-buffer-size=4096
tfile.fs.output.buffer.size=262144
fs.permissions.umask-mode=022
dfs.client.datanode-restart.timeout=30
dfs.namenode.resource.du.reserved=104857600
yarn.resourcemanager.am.max-attempts=2
yarn.nodemanager.resource.percentage-physical-cpu-limit=100
ha.failover-controller.graceful-fence.connection.retries=1
mapreduce.job.speculative.speculative-cap-running-tasks=0.1
hadoop.proxyuser.hdfs.groups=*
dfs.datanode.drop.cache.behind.writes=false
hadoop.proxyuser.HTTP.hosts=*
hadoop.common.configuration.version=0.23.0
mapreduce.job.ubertask.enable=false
yarn.app.mapreduce.am.resource.cpu-vcores=1
dfs.namenode.replication.work.multiplier.per.iteration=2
mapreduce.job.acl-modify-job=
io.seqfile.local.dir=${hadoop.tmp.dir}/io/local
yarn.resourcemanager.system-metrics-publisher.enabled=false
fs.s3.sleepTimeSeconds=10
mapreduce.client.output.filter=FAILED
------------------------

Sqoop command arguments :
import
--connect
jdbc:mysql://10.109.51.84/myapp
--table
mergetest
--username
root
--password
root1pass
--as-textfile
--target-dir
mydirImport
--m
1
=================================================================

>>> Invoking Sqoop command line now >>>

8983 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
9083 [uber-SubtaskRunner] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5-cdh5.4.0
9115 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
9145 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
9305 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.MySQLManager - Preparing to use a MySQL streaming resultset.
9305 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
10101 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `mergetest` AS t LIMIT 1
10176 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `mergetest` AS t LIMIT 1
10184 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop-mapreduce
13268 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/de9819956c5fecb758040da42554d571/mergetest.jar
13307 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql.
13307 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct
13307 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path.
13307 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql)
13316 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of mergetest
13467 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
14783 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat

 

Container stderr

 

log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:WARN No appenders could be found for logger (org.apache.hadoop.yarn.client.RMProxy).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Note: /tmp/sqoop-yarn/compile/de9819956c5fecb758040da42554d571/mergetest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
May 27, 2015 12:44:58 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
May 27, 2015 12:44:58 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
May 27, 2015 12:44:58 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
May 27, 2015 12:44:58 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
May 27, 2015 12:44:58 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
May 27, 2015 12:44:58 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
May 27, 2015 12:44:59 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
May 27, 2015 12:45:00 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest

Re: VERY MUCH URGENT ---OOZIE - SQOOP import SAVED JOB workflow Hangs

Contributor

Hi,

 

The Launcher Job runs as Map task in an MR job, and runs the command from there (Sqoop in this case).  Sqoop can then launch addiitonal jobs.  Because this Launcher Job is sitting around waiting for all of the other jobs to finish, it's actually possible to deadlock the cluster or queue, depending on the size of your cluster and/or your scheduler settings.  A good sign that this happened is if you see the "heart beat" message over and over again in the Launcher Job, and one or more ACCEPTED jobs in the RM that are not starting because there are not enough resources.  It sounds like that might be happening here.  I'm not that familiar with the Capacity Scheduler (we typically recommend the Fair Scheduler), so I can't really advise you on that specifically.

Software Engineer | Cloudera, Inc. | http://cloudera.com
Highlighted

Re: VERY MUCH URGENT ---OOZIE - SQOOP import SAVED JOB workflow Hangs

Contributor

Hi,

 

 

Please tell me what details you need from me and i will change to fair scheduler rather using capacity scheduler.

 

This is very much urgent, please treat my importance.

 

Oracle VM : CentOS 6.6, RAM 5000MB, 4 CPU cores- single node running all the hadoop services(hdfs, yarn, oozie etc)

 

Problems

1)  I have tried with fair scheduler earlier and did not work so i removed, and now updated with new config details but old config details displays in 

http://10.109.51.2:8088/cluster/scheduler, how to clear the queue from the file system or is there any process

 

2) I have changed to fair scheduler, then  to my new oozie hangs, please help me out

 

<?xml version="1.0"?>
<allocations>
<queue name="default">
<minResources>1024 mb, 1 vcores</minResources>
<maxResources>4096 mb, 4 vcores</maxResources>
<maxRunningApps>10</maxRunningApps>
<aclSubmitApps>hdfs,cdhadmin</aclSubmitApps>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
<queue name="default-sub">
<aclSubmitApps>cdhadmin</aclSubmitApps>
<minResources>1024 mb, 1 vcores</minResources>
</queue>
</queue>
<user name="cdhadmin">
<maxRunningApps>10</maxRunningApps>
</user>
<user name="hdfs">
<maxRunningApps>5</maxRunningApps>
</user>
<user name="oozie">
<maxRunningApps>5</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>
<fairSharePreemptionTimeout>30</fairSharePreemptionTimeout>
</allocations>