Created on 08-08-202111:43 PM - edited on 08-23-202109:13 PM by subratadas
In this article, we will learn to pass atlas-application.properties configuration file from a different location in spark-submit command.
When Atlas service is enabled in CDP, and we run Spark application by default, atlas-application.properties file is picked from /etc/spark/conf.cloudera.spark_on_yarn/ directory.
We can see the following output in the application log.
21/08/23 06:12:03 INFO atlas.ApplicationProperties: Looking for atlas-application.properties in classpath
21/08/23 06:12:03 INFO atlas.ApplicationProperties: Loading atlas-application.properties from file:/etc/spark/conf.cloudera.spark_on_yarn/atlas-application.properties
If we want to pass the atlas-application.properties configuration file from a different location, for example /tmp directory, copy the atlas-application.properties from /etc/spark/conf.cloudera.spark_on_yarn to /tmp directory and pass it using -Datlas.conf=/tmp/ variable in spark-submit.
Let's test with same SparkPi example by adding --driver-java-options="-Datlas.conf=/tmp/" property to the spark-submit.
We can see the following output in the application log.
21/08/05 14:36:24 INFO atlas.ApplicationProperties: Looking for atlas-application.properties in classpath
21/08/05 14:36:24 INFO atlas.ApplicationProperties: Loading atlas-application.properties from file:/tmp/atlas-application.properties
In order to run the same SparkPi example in cluster mode, we need to place the atlas-application.properties file in all nodes /tmp directory and run the Spark application as follows:
21/08/23 06:12:07 INFO atlas.ApplicationProperties: Loading atlas-application.properties from file:/data1/tmp/usercache/spark/appcache/application_1629693759177_0016/container_e74_1629693759177_0016_01_000001/./atlas-application.properties