Support Questions
Find answers, ask questions, and share your expertise

Spark Yarn running on hdfs running out of space due to filling up of filecache and usercache

Explorer

We are running a spark streaming application which consumes a lot of data from kafka using hdfs as check point directory ...noticing that these two directories are getting filled up on the data nodes and we are running out of space when we only run for couple of min's

/tmp/hadoop/data/nm-local-dir/filecache

/tmp/hadoop/data/nm-local-dir/filecache

we are new to hadoop configuration ..if someone could please point us in the right direction it would be of great help we are using spark 1.6.3 and hadoop-2.7.3 this is the yarn-site.xml

yarn-site.xml
    ~
 <configuration>
    ~
    ~   <property>
    ~       <name>yarn.nodemanager.aux-services</name>
    ~       <value>mapreduce_shuffle</value>
    ~   </property>
    ~
    ~   <property>
    ~       <name>yarn.resourcemanager.hostname</name>
    ~       <value>hdfs-name-node</value>
    ~   </property>
    ~
    ~   <property>
    ~       <name>yarn.nodemanager.resource.memory-mb</name>
    ~       <value>16384</value>
    ~   </property>
    ~
    ~   <property>
    ~       <name>yarn.nodemanager.resource.cpu-vcores</name>
    ~       <value>6</value>
    ~   </property>
    ~
    ~   <property>
    ~       <name>yarn.scheduler.maximum-allocation-mb</name>
    ~       <value>16384</value>
    ~   </property>
    ~
    ~   <!-- Needs to be explicitly set as part of a workaround for YARN-367.
    ~      | If changing this property, you must also change the
    ~      | hadoop.tmp.dir property in hdfs-site.xml. This location must always
    ~      | be a subdirectory of the location specified in hadoop.tmp.dir. This
    ~      | affects all versions of Yarn 2.0.0 through 2.7.3+. -->
    ~   <property>
    ~       <name>yarn.nodemanager.local-dirs</name>
    ~       <value>file:///tmp/hadoop/data/nm-local-dir</value>
    ~   </property>
    ~
    ~ </configuration>
2 REPLIES 2

This is likely to be data cached by the NM during "localization"; the process of downloading data locally to launch processes. Generally that data is the classpath of the program, which can be pretty big. I think it's cached for a while so that repeated launches of the same worker process can avoid the HDFS download, so start faster.

Play with yarn.nodemanager.localizer.cache.cleanup.interval-ms to see if you can cut it down, though There's also the property yarn.nodemanager.localizer.cache.target-size-mb , which sets the target size in MB for public and private files. The default is 10 * 1024; 10GB. If you have a lot less space than that, make it smaller.

Explorer

Thanks a lot for your answer stevel ......i tried a few things but still disk is getting the full i have made the clean up interval to be 500 ms and target-size to be 512mb ...but still disk get's fulll ...this is from latest try

<property>
        <name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
        <value>500</value>
    </property>
    <property>
        <name>yarn.nodemanager.localizer.cache.target-size-mb</name>
        <value>512</value>
    </property>