Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
Master Collaborator

In this article, we will learn to pass configuration file from a different location in spark-submit command.

When Atlas service is enabled in CDP, and we run Spark application by default, file is picked from /etc/spark/conf.cloudera.spark_on_yarn/ directory.

Let's test with SparkPi example:


spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client  /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 10


We can see the following output in the application log.


21/08/23 06:12:03 INFO atlas.ApplicationProperties: Looking for in classpath
21/08/23 06:12:03 INFO atlas.ApplicationProperties: Loading from file:/etc/spark/conf.cloudera.spark_on_yarn/


If we want to pass the configuration file from a different location, for example /tmp directory, copy the from /etc/spark/conf.cloudera.spark_on_yarn to /tmp directory and pass it using -Datlas.conf=/tmp/ variable in spark-submit.


Let's test with same SparkPi example by adding --driver-java-options="-Datlas.conf=/tmp/" property to the spark-submit.


spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-java-options="-Datlas.conf=/tmp/" /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 10


We can see the following output in the application log.


21/08/05 14:36:24 INFO atlas.ApplicationProperties: Looking for in classpath
21/08/05 14:36:24 INFO atlas.ApplicationProperties: Loading from file:/tmp/


In order to run the same SparkPi example in cluster mode, we need to place the file in all nodes /tmp directory and run the Spark application as follows:


spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster \
--files /tmp/ --driver-java-options="-Datlas.conf=/tmp/" \
/opt/cloudera/parcels/CDH/jars/spark-examples*.jar 10




sudo -u spark spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster \
--files /tmp/ --conf spark.driver.extraJavaOptions="-Datlas.conf=./" \
/opt/cloudera/parcels/CDH/jars/spark-examples*.jar 10


We can see the following output:


21/08/23 06:12:07 INFO atlas.ApplicationProperties: Loading from file:/data1/tmp/usercache/spark/appcache/application_1629693759177_0016/container_e74_1629693759177_0016_01_000001/./