Support Questions

johannes_kjellg · ‎08-03-2017

Hi,

Background

We are running a local mode Spark application that runs a spark job every 5 minutes using a singleton SparkContext.

- We are using Spark 1.6.2.2.4.3.0-227

- The application is long running and "sleeps" in-between the jobs

- We are using SparkContext.getOrCreate

- We are running spark in "local[*]" mode

- "spark.worker.cleanup.enabled" is set to "true"

- The application is written in Scala

- On failure we are invoking the Spark Context "stop" method in order to get a healthy SparkContext for the next job.

Problem

The "spark.local.dir" directory is filling up over time, and we eventually get "java.io.IOException: No space left on device".

-----------------

We found an old Jira ticket mentioning the issue (https://issues.apache.org/jira/browse/SPARK-7439), but it seems it was closed with the motivation "the dirs should already be cleaned up on JVM exit".

skurup · ‎08-10-2017

Are we closing the spark context here ? Usually a ".close()" call is done, the JVM should be able to clean up those directories .

johannes_kjellg · ‎08-14-2017

Hi Sumesh,
We are using a singleton spark context (SparkContext.getOrCreate). When the business logic fails we call ".stop()" (close is not available) on it to make sure a new one is created for the next run.

Cloudera Community

Support Questions

Apache Spark is not deleting the folders in the temporary directory (spark.local.dir)

Apache Livy - Apache NiFi - Apache Spark : Execut...

Apache Spark - Apache HBase Connector

HDF 3.1: Executing Apache Spark via ExecuteSparkIn...

Data Ingest with Apache Zeppelin + Apache Spark 1....

Introduction to Apache Spark and Develop Spark App...

Open Source Geospatial Analytics with Apache Spar...

Twitter Sentiment using Spark Core NLP in Apache Z...

How to change NiFi temporary staging path in case ...

Play Framework 2.6.X: Web and Apache Spark Integra...

Running Apache Beam Spark Runner on HDP 2.5