Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Application Failed for YARN exit code 12




I've a yarn application launched via oozie in yarn-cluster mode that sometimes fails for an unknown error.


The stdout and stderr logs from the driver don't any error (they are cutoff in the middle of some INFO messages), but I've found a strange error in the log of the NodeManager running the AM container:


2017-XX-XX XX:XX:XX,XXX WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_e14_14XXXXXXXXXXX_XXXXX_01_000001 and
 exit code: 12
ExitCodeException exitCode=12: 
        at org.apache.hadoop.util.Shell.runCommand(
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$

I've searched the documentation for this exit code but it's not included in the standard YARN exit codes:


Does anyone knows what this exit code 12 means?



I would track down the logs for container container_e14_14XXXXXXXXXXX_XXXXX_01_000001. That should contain more details on the actual error.


In the logs for the ApplicationMaster/SparkDriver (which was around 4GB) I've found a StackOverflowError from Spark reporter thread: I've found this Spark issue that matches my error.


The job was launched used dynamicAllocation and requested an insane number of containers (16000 with 20GB/8cores) and apparently this can cause a SO in the Spark thread managing the executors.


An easy workaround is to disable dynamicAllocation and use a fixed number of executor. With 10 executors the job is running fine.



Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.