Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to change Spark _temporary directory when writing data?

How to change Spark _temporary directory when writing data?

Rising Star

I have two spark applications writing data to one directory on HDFS, which cause the faster completed app will delete the working directory _temporary containing some temp file belonging to another app.

So can I specify a _temporary directory for each Spark application?

5 REPLIES 5
Highlighted

Re: How to change Spark _temporary directory when writing data?

Expert Contributor

@Junfeng Chen

You can change the path to the temp folder for each Spark application by spark.local.dir property like below

SparkConf conf = new SparkConf().setMaster("local”).setAppName("test”).set("spark.local.dir", "/tmp/spark-temp");

Reference
Please accept the answer you found most useful

Highlighted

Re: How to change Spark _temporary directory when writing data?

Rising Star

Thanks @Jagadeesan A S

_temporary is a temp directory under path of the df.write.parquet(path) on hdfs. However spark.local.dir default value is /tmp, and in document,

Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system.

So it should be a directory on local file system. I am not sure spark.local.dir refers to the temp directory of spark writing ...

Highlighted

Re: How to change Spark _temporary directory when writing data?

Expert Contributor

@Junfeng Chen

That's true, above property for local filesystem. For hdfs could you try to use Append instead of Overwrite ? But problem in this, we need to delete files manually from the temp directory.

Highlighted

Re: How to change Spark _temporary directory when writing data?

Rising Star

Hi @Jagadeesan A S

my current save mode is append. My sparking streaming apps will run every 5 min, it is not convenient to delete manually....So I think the better solution is customize the temp location.

Or Can I set offset of the scheduled running time? For example, my current 2 apps every 5 minutes, that's run at 0, 5, 10, 15, 20

Can I set a schedule, make one still runs at 0, 5, 10 , 15, and another runs at 2.5, 7.5, 10.5?

Highlighted

Re: How to change Spark _temporary directory when writing data?

New Contributor

Did you ever figure out the solution? I am facing the same issue

Don't have an account?
Coming from Hortonworks? Activate your account here