Reply
Explorer
Posts: 13
Registered: ‎10-25-2016
Accepted Solution

Spark job fails in cluster mode.

Hi All

 

I have been trying to submit below spark job in cluster mode through a bash shell.

Client mode submit works perfectly fine. But when i switch to cluster mode, this fails with error, no app file present.

App file refers to missing application.conf.

 

spark-submit \
--master yarn \
--deploy-mode cluster \
--class myCLASS \
--properties-file /home/abhig/spark.conf \
--files /home/abhig/application.conf \
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=/home/abhig/log4.properties" \
--driver-java-options "-Dconfig.file=/home/abhig/application.conf -Dlog4j.configuration=/home/abhig/log4.properties" \
/loca/project/gateway/mypgm.jar

 

I followed the link below on similar post

https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-f...

This solution mentioned is still not clear.

 

I even tried 

 --files $CONFIG_FILE#application.conf

Still it doesn't work.

Any help will be appreciated.

 

Thanks
AB

Abhishek
Highlighted
Cloudera Employee
Posts: 8
Registered: ‎03-01-2016

Re: Spark job fails in cluster mode.

[ Edited ]

Two points:

1) in cluster mode, you should use "--conf spark.driver.extraJavaOptions=" instead of "--driver-java-options"

2) you only provide application.conf in --file list, there's no log4.properties. So either you have this log4.properties distributed on each YARN node, or you should add this log4.properties file to --file list, and reference it with "-Dlog4j.configuration=./log4.properties"

 

For cluster mode, the full command should look like the following:

 

spark-submit \
--master yarn \
--deploy-mode cluster \
--class myCLASS \
--properties-file /home/abhig/spark.conf \
--files /home/abhig/application.conf,/home/abhig/log4.propertie \
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=./log4.properties" \
--conf spark.driver.extraJavaOptions="-Dconfig.file=./application.conf -Dlog4j.configuration=./log4.properties" \
/loca/project/gateway/mypgm.jar

 

Explorer
Posts: 13
Registered: ‎10-25-2016

Re: Spark job fails in cluster mode.

Hi All

 

Thanks @Yuexin Zhang for the response.

I figured out the solution for this.

Below is the actual submit which worked for me.

The catch here is that when we submit in cluster mode, it uploads the file to a staging dir on hdfs.

Now the path and name of the file is different on hdfs then what it expects in the program.

To make that file available in the program, u have to make an alias for that file with '#' like mentioned below. (thats the only trick).

Now everywhere, u need to refer to that file, just mention that alias on spark submit command.

I mentioned the complete walkthrough and how to reach the solution in below links i referred to.

 

Issue also discussed here - https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-f... - (Didn't actually helped me resolved, so i posted it separately)

Section "Important notes" in http://spark.apache.org/docs/latest/running-on-yarn.html ( Kinda have to read between the lines)

Blog explaining the reason - http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.html (Nice blog :) )

 

spark-submit \
--master yarn \
--deploy-mode cluster \
--class myCLASS \
--properties-file /home/abhig/spark.conf \
--files /home/abhig/application.conf#application.conf,/home/abhig/log4.properties#log4j \
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=log4j" \
--conf spark.driver.extraJavaOptions="-Dconfig.file=application.conf -Dlog4j.configuration=log4j" \
/local/project/gateway/mypgm.jar

 

 

Hope this helps the next person facing similar issue!

Abhishek
Announcements