Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark job submit on yarn cluster is failing abruptly

Highlighted

Spark job submit on yarn cluster is failing abruptly

Hi All,

We are submitting spark job on yarn cluster with config: --master yarn --deploy-mode cluster --executor-cores 5 --num-executors 3 --executor-memory 8G --driver-memory 3g --conf spark.yarn.executor.memoryOverhead=6144 --conf spark.cores.max=30 --conf spark.memory.fraction=0.9 --conf spark.memory.storageFraction=0.1

whereas having 3 node cluster environment with 60 GB available on each node.

The abrupt behavior is when sometimes the job fails with xml not found exception when all the xmls are present on spark job submit stated jar paths while the other time it runs successfully with same command arguments.

We have checked RAM and cluster memory availability in our first analysis and found no issues there. Also compared logs of each time(fail as well as success) and found no discrepancy.

No clues what need to be checked here so that our job run will not fail intermittently using same command.

Please support here to en light more possible components to be checked to resolve this issue.

Regards,

Garima.

4 REPLIES 4
Highlighted

Re: Spark job submit on yarn cluster is failing abruptly

@Garima Verma

File not found exception is not the actual exception.

You should see the actual exception just before this one.

Highlighted

Re: Spark job submit on yarn cluster is failing abruptly

Only one exception is encountered across log ie. file not found of one xml.

Highlighted

Re: Spark job submit on yarn cluster is failing abruptly

Contributor

@Garima Verma

Can you please upload the log file of this job?

Re: Spark job submit on yarn cluster is failing abruptly

Expert Contributor

@Garima Verma

You have not given the stack trace here so folks will not really know how to address that clearly unless the stack trace is provided. But given the explanation that was provided, I would suggest that you can try to pass the given xml with the "--files" to the spark submit command and then try again.

Don't have an account?
Coming from Hortonworks? Activate your account here