Support Questions

Find answers, ask questions, and share your expertise

Job Spark in yarn execution failed

avatar

Hi community

I am executing a job spark, but after 187 hours of execution it generates the following error:

 

Application application_1584698544596_3421 failed 2 times due to AM Container for appattempt_1584698544596_3421_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://server1.corp:8088/proxy/application_1584698544596_3421/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e103_1584698544596_3421_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.__launchContainer__(LinuxContainerExecutor.java:399)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
 
Shell output: main : command provided 1
main : run as user is development
main : requested yarn user is development
Writing to tmp file /hdfs5/yarn/nm/nmPrivate/application_1584698544596_3421/container_e103_1584698544596_3421_02_000001/container_e103_1584698544596_3421_02_000001.pid.tmp
Writing to cgroup task files...
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
 
And it is not possible to see the logs either because it generates this error:
 
Logs not available for container_e103_1584698544596_3421_01_000001. Aggregation may not be complete, Check back later or try the nodemanager at server40.corp:8041
Or see application log at http://server40.corp:8041/node/application/application_1584698544596_3421
 
I appreciate your help, because I don't know what can happen
1 REPLY 1

avatar
Cloudera Employee

Hi Wilson,

 

Exitcode 1 means that it is Failing to initialize the container localizer.  Could you please try uploading the application logs if it is available now, else Please share the Resource manager logs and Nodemanager logs to understand what happenned during the container creation.

 

 

Thanks

AKR