Support Questions

Find answers, ask questions, and share your expertise

Yarn - Exception from container-launch while executing hive query

avatar
Explorer

Hello Team,

 

My cloudera cluster information:

 

Name node with HA enabled

Resource manager with HA enabled

Mapreduce Framework 1 was installed before and it was removed.

3 Node managers and 3 Data nodes

Hive service is installed

Cluster is being managed from Cloudera manager

Version 5.11

 

I cant be able to run mapreduce job from hive.

 

 

hive> INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
Query ID = labuser_20170603150101_eaca6901-5d5f-4c40-8751-2576f7349396
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1496499143480_0003, Tracking URL = http://ip-10-0-21-98.ec2.internal:8088/proxy/application_1496499143480_0003/
Kill Command = /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/hadoop/bin/hadoop job  -kill job_1496499143480_0003
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2017-06-03 15:01:52,051 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1496499143480_0003 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
hive>

 

 

Basic Configuation Information about cluster:

Maximum application attempt and mapreduce attempt is set to 2.

Nodemanager memory alloted: 6GB per node 

Nodemanager core alloted:  2 vcore

Cluster capacity : 6Vcore and 18GB RAM

Container size min and max : 1vcore 512MB and 2vcore 2GB

Mapper memory min and max : 512MB

Reducer memory min and max : 1GB

Heap size for individual deamons : 1GB

Incremental memory and core for container as well as mapred : 512MB and 1Vcore

 

 

Resource manager logs:

 

 

2017-06-03 15:01:39,597 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 3
2017-06-03 15:01:40,719 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 3 submitted by user labuser
2017-06-03 15:01:40,719 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1496499143480_0003
2017-06-03 15:01:40,720 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1496499143480_0003 State change from NEW to NEW_SAVING on event = START
2017-06-03 15:01:40,720 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1496499143480_0003
2017-06-03 15:01:40,719 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=labuser	IP=10.0.20.51	OPERATION=Submit Application Request	TARGET=ClientRMService	RESULT=SUCCESS	APPID=application_1496499143480_0003
2017-06-03 15:01:40,729 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1496499143480_0003 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2017-06-03 15:01:40,730 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1496499143480_0003 from user: labuser, in queue: root.users.labuser, currently num of applications: 1
2017-06-03 15:01:40,730 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1496499143480_0003 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2017-06-03 15:01:40,730 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1496499143480_0003_000001
2017-06-03 15:01:40,731 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000001 State change from NEW to SUBMITTED on event = START
2017-06-03 15:01:40,731 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1496499143480_0003_000001 to scheduler from user: labuser
2017-06-03 15:01:40,731 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2017-06-03 15:01:41,518 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e02_1496499143480_0003_01_000001 Container Transitioned from NEW to ALLOCATED
2017-06-03 15:01:41,518 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=labuser	OPERATION=AM Allocated Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1496499143480_0003	CONTAINERID=container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:41,518 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e02_1496499143480_0003_01_000001 of capacity <memory:1024, vCores:1> on host ip-10-0-21-245.ec2.internal:8041, which has 1 containers, <memory:1024, vCores:1> used and <memory:5120, vCores:1> available after allocation
2017-06-03 15:01:41,519 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : ip-10-0-21-245.ec2.internal:8041 for container : container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:41,519 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e02_1496499143480_0003_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2017-06-03 15:01:41,519 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Clear node set for appattempt_1496499143480_0003_000001
2017-06-03 15:01:41,519 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Storing attempt: AppId: application_1496499143480_0003 AttemptId: appattempt_1496499143480_0003_000001 MasterContainer: Container: [ContainerId: container_e02_1496499143480_0003_01_000001, NodeId: ip-10-0-21-245.ec2.internal:8041, NodeHttpAddress: ip-10-0-21-245.ec2.internal:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.0.21.245:8041 }, ]
2017-06-03 15:01:41,520 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000001 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2017-06-03 15:01:41,523 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2017-06-03 15:01:41,524 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1496499143480_0003_000001
2017-06-03 15:01:41,526 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_e02_1496499143480_0003_01_000001, NodeId: ip-10-0-21-245.ec2.internal:8041, NodeHttpAddress: ip-10-0-21-245.ec2.internal:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.0.21.245:8041 }, ] for AM appattempt_1496499143480_0003_000001
2017-06-03 15:01:41,526 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1496499143480_0003_000001
2017-06-03 15:01:41,526 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1496499143480_0003_000001
2017-06-03 15:01:41,535 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_e02_1496499143480_0003_01_000001, NodeId: ip-10-0-21-245.ec2.internal:8041, NodeHttpAddress: ip-10-0-21-245.ec2.internal:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.0.21.245:8041 }, ] for AM appattempt_1496499143480_0003_000001
2017-06-03 15:01:41,535 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000001 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2017-06-03 15:01:42,520 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e02_1496499143480_0003_01_000001 Container Transitioned from ACQUIRED to RUNNING
2017-06-03 15:01:45,872 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e02_1496499143480_0003_01_000001 Container Transitioned from RUNNING to COMPLETED
2017-06-03 15:01:45,872 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Completed container: container_e02_1496499143480_0003_01_000001 in state: COMPLETED event:FINISHED
2017-06-03 15:01:45,872 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=labuser	OPERATION=AM Released Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1496499143480_0003	CONTAINERID=container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:45,872 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_e02_1496499143480_0003_01_000001 of capacity <memory:1024, vCores:1> on host ip-10-0-21-245.ec2.internal:8041, which currently has 0 containers, <memory:0, vCores:0> used and <memory:6144, vCores:2> available, release resources=true
2017-06-03 15:01:45,872 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1496499143480_0003_000001 with final state: FAILED, and exit status: 1
2017-06-03 15:01:45,872 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application attempt appattempt_1496499143480_0003_000001 released container container_e02_1496499143480_0003_01_000001 on node: host: ip-10-0-21-245.ec2.internal:8041 #containers=0 available=6144 used=0 with event: FINISHED
2017-06-03 15:01:45,873 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000001 State change from LAUNCHED to FINAL_SAVING on event = CONTAINER_FINISHED
2017-06-03 15:01:45,878 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1496499143480_0003_000001
2017-06-03 15:01:45,878 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished, removing password for appattempt_1496499143480_0003_000001
2017-06-03 15:01:45,878 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000001 State change from FINAL_SAVING to FAILED on event = ATTEMPT_UPDATE_SAVED
2017-06-03 15:01:45,878 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of failed attempts is 1. The max attempts is 2
2017-06-03 15:01:45,879 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application appattempt_1496499143480_0003_000001 is done. finalState=FAILED
2017-06-03 15:01:45,879 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1496499143480_0003_000002
2017-06-03 15:01:45,879 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1496499143480_0003 requests cleared
2017-06-03 15:01:45,879 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000002 State change from NEW to SUBMITTED on event = START
2017-06-03 15:01:45,879 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1496499143480_0003_000002 to scheduler from user: labuser
2017-06-03 15:01:45,880 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000002 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2017-06-03 15:01:46,764 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e02_1496499143480_0003_02_000001 Container Transitioned from NEW to ALLOCATED
2017-06-03 15:01:46,764 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=labuser	OPERATION=AM Allocated Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1496499143480_0003	CONTAINERID=container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:46,764 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e02_1496499143480_0003_02_000001 of capacity <memory:1024, vCores:1> on host ip-10-0-21-96.ec2.internal:8041, which has 1 containers, <memory:1024, vCores:1> used and <memory:5120, vCores:1> available after allocation
2017-06-03 15:01:46,765 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : ip-10-0-21-96.ec2.internal:8041 for container : container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:46,765 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e02_1496499143480_0003_02_000001 Container Transitioned from ALLOCATED to ACQUIRED
2017-06-03 15:01:46,765 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Clear node set for appattempt_1496499143480_0003_000002
2017-06-03 15:01:46,765 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Storing attempt: AppId: application_1496499143480_0003 AttemptId: appattempt_1496499143480_0003_000002 MasterContainer: Container: [ContainerId: container_e02_1496499143480_0003_02_000001, NodeId: ip-10-0-21-96.ec2.internal:8041, NodeHttpAddress: ip-10-0-21-96.ec2.internal:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.0.21.96:8041 }, ]
2017-06-03 15:01:46,766 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000002 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2017-06-03 15:01:46,770 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000002 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2017-06-03 15:01:46,771 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1496499143480_0003_000002
2017-06-03 15:01:46,773 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_e02_1496499143480_0003_02_000001, NodeId: ip-10-0-21-96.ec2.internal:8041, NodeHttpAddress: ip-10-0-21-96.ec2.internal:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.0.21.96:8041 }, ] for AM appattempt_1496499143480_0003_000002
2017-06-03 15:01:46,773 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1496499143480_0003_000002
2017-06-03 15:01:46,773 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1496499143480_0003_000002
2017-06-03 15:01:46,782 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_e02_1496499143480_0003_02_000001, NodeId: ip-10-0-21-96.ec2.internal:8041, NodeHttpAddress: ip-10-0-21-96.ec2.internal:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.0.21.96:8041 }, ] for AM appattempt_1496499143480_0003_000002
2017-06-03 15:01:46,782 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000002 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2017-06-03 15:01:47,768 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e02_1496499143480_0003_02_000001 Container Transitioned from ACQUIRED to RUNNING
2017-06-03 15:01:51,243 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e02_1496499143480_0003_02_000001 Container Transitioned from RUNNING to COMPLETED
2017-06-03 15:01:51,243 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Completed container: container_e02_1496499143480_0003_02_000001 in state: COMPLETED event:FINISHED
2017-06-03 15:01:51,243 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=labuser	OPERATION=AM Released Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1496499143480_0003	CONTAINERID=container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:51,243 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_e02_1496499143480_0003_02_000001 of capacity <memory:1024, vCores:1> on host ip-10-0-21-96.ec2.internal:8041, which currently has 0 containers, <memory:0, vCores:0> used and <memory:6144, vCores:2> available, release resources=true
2017-06-03 15:01:51,243 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1496499143480_0003_000002 with final state: FAILED, and exit status: 1
2017-06-03 15:01:51,243 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application attempt appattempt_1496499143480_0003_000002 released container container_e02_1496499143480_0003_02_000001 on node: host: ip-10-0-21-96.ec2.internal:8041 #containers=0 available=6144 used=0 with event: FINISHED
2017-06-03 15:01:51,243 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000002 State change from LAUNCHED to FINAL_SAVING on event = CONTAINER_FINISHED
2017-06-03 15:01:51,249 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1496499143480_0003_000002
2017-06-03 15:01:51,249 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished, removing password for appattempt_1496499143480_0003_000002
2017-06-03 15:01:51,249 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1496499143480_0003_000002 State change from FINAL_SAVING to FAILED on event = ATTEMPT_UPDATE_SAVED
2017-06-03 15:01:51,249 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of failed attempts is 2. The max attempts is 2
2017-06-03 15:01:51,249 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1496499143480_0003 with final state: FAILED
2017-06-03 15:01:51,249 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1496499143480_0003
2017-06-03 15:01:51,250 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1496499143480_0003 State change from ACCEPTED to FINAL_SAVING on event = ATTEMPT_FAILED
2017-06-03 15:01:51,250 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application appattempt_1496499143480_0003_000002 is done. finalState=FAILED
2017-06-03 15:01:51,250 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1496499143480_0003 requests cleared
2017-06-03 15:01:51,293 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1496499143480_0003 failed 2 times due to AM Container for appattempt_1496499143480_0003_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://ip-10-0-21-98.ec2.internal:8088/proxy/application_1496499143480_0003/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e02_1496499143480_0003_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
	at org.apache.hadoop.util.Shell.run(Shell.java:504)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
2017-06-03 15:01:51,293 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1496499143480_0003 State change from FINAL_SAVING to FAILED on event = APP_UPDATE_SAVED
2017-06-03 15:01:51,293 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=labuser	OPERATION=Application Finished - Failed	TARGET=RMAppManager	RESULT=FAILURE	DESCRIPTION=App failed with state: FAILED	PERMISSIONS=Application application_1496499143480_0003 failed 2 times due to AM Container for appattempt_1496499143480_0003_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://ip-10-0-21-98.ec2.internal:8088/proxy/application_1496499143480_0003/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e02_1496499143480_0003_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
	at org.apache.hadoop.util.Shell.run(Shell.java:504)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.	APPID=application_1496499143480_0003
2017-06-03 15:01:51,294 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1496499143480_0003,name=INSERT INTO TABLE students VALUES ('...2.32)(Stage-1),user=labuser,queue=root.users.labuser,state=FAILED,trackingUrl=http://ip-10-0-21-98.ec2.internal:8088/cluster/app/application_1496499143480_0003,appMasterHost=N/A,startTime=1496502100719,finishTime=1496502111249,finalStatus=FAILED
2017-06-03 15:01:52,077 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=labuser	IP=10.0.20.51	OPERATION=Kill Application Request	TARGET=ClientRMService	RESULT=SUCCESS	APPID=application_1496499143480_0003

 

Node manager logs:

 

Node1 

 

2017-06-03 15:01:41,531 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_e02_1496499143480_0003_01_000001 by user labuser
2017-06-03 15:01:41,531 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1496499143480_0003
2017-06-03 15:01:41,531 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1496499143480_0003 transitioned from NEW to INITING
2017-06-03 15:01:41,532 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=labuser	IP=10.0.21.98	OPERATION=Start Container Request	TARGET=ContainerManageImpl	RESULT=SUCCESS	APPID=application_1496499143480_0003	CONTAINERID=container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:41,541 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log rolling monitoring interval is disabled. The logs will be aggregated after this application is finished.
2017-06-03 15:01:41,567 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_e02_1496499143480_0003_01_000001 to application application_1496499143480_0003
2017-06-03 15:01:41,568 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1496499143480_0003 transitioned from INITING to RUNNING
2017-06-03 15:01:41,568 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_01_000001 transitioned from NEW to LOCALIZING
2017-06-03 15:01:41,569 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1496499143480_0003
2017-06-03 15:01:41,569 INFO org.apache.spark.network.yarn.YarnShuffleService: Initializing container container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:41,570 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:41,571 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /yarn/nm/nmPrivate/container_e02_1496499143480_0003_01_000001.tokens. Credentials list: 
2017-06-03 15:01:41,571 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user labuser
2017-06-03 15:01:41,573 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /yarn/nm/nmPrivate/container_e02_1496499143480_0003_01_000001.tokens to /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003/container_e02_1496499143480_0003_01_000001.tokens
2017-06-03 15:01:41,573 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003 = file:/yarn/nm/usercache/labuser/appcache/application_1496499143480_0003
2017-06-03 15:01:42,111 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_01_000001 transitioned from LOCALIZING to LOCALIZED
2017-06-03 15:01:42,132 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_01_000001 transitioned from LOCALIZED to RUNNING
2017-06-03 15:01:42,136 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003/container_e02_1496499143480_0003_01_000001/default_container_executor.sh]
2017-06-03 15:01:44,297 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:44,354 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 13223 for container-id container_e02_1496499143480_0003_01_000001: 114.7 MB of 1 GB physical memory used; 1.4 GB of 2.1 GB virtual memory used
2017-06-03 15:01:45,825 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e02_1496499143480_0003_01_000001 is : 1
2017-06-03 15:01:45,825 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_e02_1496499143480_0003_01_000001 and exit code: 1
ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
	at org.apache.hadoop.util.Shell.run(Shell.java:504)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=1: 
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.util.Shell.run(Shell.java:504)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2017-06-03 15:01:45,827 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at java.lang.Thread.run(Thread.java:745)
2017-06-03 15:01:45,828 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
2017-06-03 15:01:45,828 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2017-06-03 15:01:45,828 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:45,860 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=labuser	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1496499143480_0003	CONTAINERID=container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:45,860 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_01_000001 transitioned from EXITED_WITH_FAILURE to DONE
2017-06-03 15:01:45,860 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_e02_1496499143480_0003_01_000001 from application application_1496499143480_0003
2017-06-03 15:01:45,860 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Considering container container_e02_1496499143480_0003_01_000001 for log-aggregation
2017-06-03 15:01:45,860 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1496499143480_0003
2017-06-03 15:01:45,860 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:45,860 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003/container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:46,871 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_e02_1496499143480_0003_01_000001]
2017-06-03 15:01:47,355 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_e02_1496499143480_0003_01_000001
2017-06-03 15:01:51,879 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1496499143480_0003 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2017-06-03 15:01:51,880 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003
2017-06-03 15:01:51,880 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1496499143480_0003
2017-06-03 15:01:51,880 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping application application_1496499143480_0003
2017-06-03 15:01:51,880 INFO org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: Application application_1496499143480_0003 removed, cleanupLocalDirs = false
2017-06-03 15:01:51,881 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1496499143480_0003 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2017-06-03 15:01:51,881 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Application just finished : application_1496499143480_0003
2017-06-03 15:01:51,897 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_e02_1496499143480_0003_01_000001. Current good log dirs are /yarn/container-logs
2017-06-03 15:01:51,897 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /yarn/container-logs/application_1496499143480_0003/container_e02_1496499143480_0003_01_000001/stdout
2017-06-03 15:01:51,898 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /yarn/container-logs/application_1496499143480_0003/container_e02_1496499143480_0003_01_000001/stderr
2017-06-03 15:01:51,898 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /yarn/container-logs/application_1496499143480_0003/container_e02_1496499143480_0003_01_000001/syslog
2017-06-03 15:01:51,932 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /yarn/container-logs/application_1496499143480_0003

 

Node2:

 

 

2017-06-03 15:01:46,776 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_e02_1496499143480_0003_02_000001 by user labuser
2017-06-03 15:01:46,776 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1496499143480_0003
2017-06-03 15:01:46,777 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1496499143480_0003 transitioned from NEW to INITING
2017-06-03 15:01:46,777 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=labuser	IP=10.0.21.98	OPERATION=Start Container Request	TARGET=ContainerManageImpl	RESULT=SUCCESS	APPID=application_1496499143480_0003	CONTAINERID=container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:46,780 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log rolling monitoring interval is disabled. The logs will be aggregated after this application is finished.
2017-06-03 15:01:46,792 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_e02_1496499143480_0003_02_000001 to application application_1496499143480_0003
2017-06-03 15:01:46,793 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1496499143480_0003 transitioned from INITING to RUNNING
2017-06-03 15:01:46,793 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_02_000001 transitioned from NEW to LOCALIZING
2017-06-03 15:01:46,794 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1496499143480_0003
2017-06-03 15:01:46,794 INFO org.apache.spark.network.yarn.YarnShuffleService: Initializing container container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:46,795 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:46,797 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /yarn/nm/nmPrivate/container_e02_1496499143480_0003_02_000001.tokens. Credentials list: 
2017-06-03 15:01:46,798 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user labuser
2017-06-03 15:01:46,799 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /yarn/nm/nmPrivate/container_e02_1496499143480_0003_02_000001.tokens to /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003/container_e02_1496499143480_0003_02_000001.tokens
2017-06-03 15:01:46,799 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003 = file:/yarn/nm/usercache/labuser/appcache/application_1496499143480_0003
2017-06-03 15:01:47,353 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_02_000001 transitioned from LOCALIZING to LOCALIZED
2017-06-03 15:01:47,374 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_02_000001 transitioned from LOCALIZED to RUNNING
2017-06-03 15:01:47,378 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003/container_e02_1496499143480_0003_02_000001/default_container_executor.sh]
2017-06-03 15:01:47,425 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:47,455 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 21272 for container-id container_e02_1496499143480_0003_02_000001: 18.9 MB of 1 GB physical memory used; 1.4 GB of 2.1 GB virtual memory used
2017-06-03 15:01:50,491 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 21272 for container-id container_e02_1496499143480_0003_02_000001: 143.6 MB of 1 GB physical memory used; 1.4 GB of 2.1 GB virtual memory used
2017-06-03 15:01:51,213 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e02_1496499143480_0003_02_000001 is : 1
2017-06-03 15:01:51,213 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_e02_1496499143480_0003_02_000001 and exit code: 1
ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
	at org.apache.hadoop.util.Shell.run(Shell.java:504)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=1: 
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.util.Shell.run(Shell.java:504)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2017-06-03 15:01:51,214 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 	at java.lang.Thread.run(Thread.java:745)
2017-06-03 15:01:51,215 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
2017-06-03 15:01:51,215 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_02_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2017-06-03 15:01:51,215 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:51,235 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=labuser	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1496499143480_0003	CONTAINERID=container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:51,236 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e02_1496499143480_0003_02_000001 transitioned from EXITED_WITH_FAILURE to DONE
2017-06-03 15:01:51,236 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_e02_1496499143480_0003_02_000001 from application application_1496499143480_0003
2017-06-03 15:01:51,236 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Considering container container_e02_1496499143480_0003_02_000001 for log-aggregation
2017-06-03 15:01:51,236 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1496499143480_0003
2017-06-03 15:01:51,236 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003/container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:51,236 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_e02_1496499143480_0003_02_000001
2017-06-03 15:01:52,241 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_e02_1496499143480_0003_02_000001]
2017-06-03 15:01:52,241 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1496499143480_0003 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2017-06-03 15:01:52,242 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /yarn/nm/usercache/labuser/appcache/application_1496499143480_0003
2017-06-03 15:01:52,242 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1496499143480_0003
2017-06-03 15:01:52,242 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping application application_1496499143480_0003
2017-06-03 15:01:52,242 INFO org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: Application application_1496499143480_0003 removed, cleanupLocalDirs = false
2017-06-03 15:01:52,242 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1496499143480_0003 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2017-06-03 15:01:52,242 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Application just finished : application_1496499143480_0003
2017-06-03 15:01:52,340 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_e02_1496499143480_0003_02_000001. Current good log dirs are /yarn/container-logs
2017-06-03 15:01:52,341 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /yarn/container-logs/application_1496499143480_0003/container_e02_1496499143480_0003_02_000001/stderr
2017-06-03 15:01:52,342 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /yarn/container-logs/application_1496499143480_0003/container_e02_1496499143480_0003_02_000001/syslog
2017-06-03 15:01:52,343 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /yarn/container-logs/application_1496499143480_0003/container_e02_1496499143480_0003_02_000001/stdout
2017-06-03 15:01:52,372 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /yarn/container-logs/application_1496499143480_0003
2017-06-03 15:01:53,491 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_e02_1496499143480_0003_02_000001

 

 

1 ACCEPTED SOLUTION

avatar
Explorer

I have fixed the issue. The problem was due to problem in loading hadoop native library after pointing a env variable which refers hadoop native library, I can run MR jobs in my cluster.

View solution in original post

6 REPLIES 6

avatar
Explorer

The logs of the YARN services (RM, NM) are irrelevant. You must inspect the logs of the YARN job in HistoryServer.

Note that job_1496499143480_0003 uses the legacy naming convention (pre-YARN); the actual YARN job ID is application_1496499143480_0003

 

Option 1: open the YARN UI, and inspect the dashboard of recent jobs for that ID

Option 2: use the CLI i.e.

  yarn application -status application_1496499143480_0003

  yarn logs -applicationId application_1496499143480_0003 | more

 

 

avatar
Explorer

Hello,

 

When I inspected the application master attempts and its container logs: \

My log message:

ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
	at org.apache.hadoop.util.Shell.run(Shell.java:504)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

The reason I displayed the rm and nm log was just telling you that container has allocated for running the job and tries to execute the application on it until retry limit has reached.

avatar
Explorer

I have fixed the issue. The problem was due to problem in loading hadoop native library after pointing a env variable which refers hadoop native library, I can run MR jobs in my cluster.

avatar
New Contributor

I facing same problem, can please tell which config to edit to point native library.


@rkkrishnaawrote:

I have fixed the issue. The problem was due to problem in loading hadoop native library after pointing a env variable which refers hadoop native library, I can run MR jobs in my cluster.


 

avatar
Explorer
Have you fixed this problem?

avatar
Explorer
It seems JobHistory's directory authority