Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Jar caching?

Highlighted

Spark Jar caching?

Contributor

I'm running an application with spark-submit. The application uses both Scala and Java. The spark-submit specifies the location of the jar file with --jars

 

A strange phenomenon I'm seeing - even though I make modifications to my Java files and build new jar files, the cluster sometimes uses my older jar files. It is as if the cluster has a cached copy of my old jar file. 

 

Can someone please educate me on where to look for older or cached jar files and clean them up?

 

ps: I'm using Cloudera 5.5.1, with Spark 1.5.0

 

Thanks.

3 REPLIES 3

Re: Spark Jar caching?

Expert Contributor

What yarn mode are you using yarn-client or yarn-cluster?  Where is the jar you are trying to load, is it local to the driver or in hdfs?  Are you shutting down the spark context or trying to add the jar programatically?

 

Look for log messages, in the driver you will see "Added JAR" and then your jar file name, you will see an error if there was an issue loading your jar, if it already existed in the spark file server.  In the containers you will find messages like "Fetching", "Copying", and "Adding file".  If the jar file is cached, the "Fetching" message will be missing.  There is also an overwrite option that will delete and replace the files if it exists.

 

It is sometimes useful to version your jars which will make it easier to determine if an older version is being used or not.

Re: Spark Jar caching?

New Contributor

>>There is also an overwrite option that will delete and replace the files if it exists.

 

What is that option?

Re: Spark Jar caching?

Expert Contributor

You can use the configuration "spark.files.overwrite" to control whether files distributed through spark will be overwritten.  Please see executor configuration documenation[1] for default behavior, but with default is currently to not overwrite files.

 

1.  https://spark.apache.org/docs/1.5.0/configuration.html#execution-behavior