For example, the python launches a number of Spark submit
jobs, some of them would fail with this exception. If we re-run the
framework to re-launch the failed jobs, some of them may fail again but
some of them may succeed. If we keep re-running the failed jobs
eventually all of them succeed, this issue is intermittent.
error message in yarn application log:
17/02/13 22:38:55 ERROR yarn.ApplicationMaster: User class threw exception: com.fasterxml.jackson.databind.JsonMappingException: Can not deserialize instance of com.mgl.dh.silfspark.configs.FileCnfg out of VALUE_STRING token
error in yarn node manager log:
2017-02-13 22:38:58,131 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:launchContainer(313)) - Exception from container-launch with container ID: container_1486736461 720_22516_01_000084 and exit code: 15
ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:297)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Environment:
HDFS 2.7.1
YARN 2.7.1
Spark 1.5.1