I have the problem with free space in my cluster. There are 2 slave nodes and 1 master node. Each node has 30 Gb of space. I assume that it's quite enough for the processes that I am executing.
When I run Spark job abound 10-15 times, I notice that the free space in one of the slave node decreases dramatically. I start receiving red-coloured alerts in Ambari UI. The Spark job does not save any data in HDFS. It makes some intensive data processing. Also, I use `df.cache` a couple of times in the code, but then I run `unpersist(false)`. This is how I run my Spark job:
When I manually inspect the node from terminal, I see that there are a lot of garbage files stored in `spark2-history`, `.sparkStaging` and `/hadoop/yarn/local/usercache/hdfs` (as described here). I should manually delete all the content of these folders to make the cluster operative again. What is wrong with my settings of Ambari cluster? Shouldn't there be a continuous automated garbage cleaning after each execution of Spark job?