<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to increase DFS space on existing cluster in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-increase-DFS-space-on-existing-cluster/m-p/187845#M149946</link>
    <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/18739/sysadmin.html"&gt;@sysadmin CreditVidya&lt;/A&gt; There are several approaches I can think of might help:&lt;/P&gt;&lt;P&gt;1. It appears MR intermediate data is not being purged properly by Hadoop itself, you can manually delete files/folders configured in mapreduce.cluster.local.dir after MR jobs are completed, say files/folders older than 3 days. You can probably create a cron job for that purpose.&lt;/P&gt;&lt;P&gt;2. Make sure to implement cleanup() method in each mapper/reducer class, which will clean up local resources, and aggregates before the task exists.&lt;/P&gt;&lt;P&gt;3. Run hdfs balancer regularly, normally weekly or bi-weekly, that way you won't have too much more hdfs data stored on some nodes comparing to the others, as MR jobs always try to use the local copy of the data first, and always keep an eye on 'disk usage' for each host in Ambari.&lt;/P&gt;&lt;P&gt;Hope that helps.&lt;/P&gt;</description>
    <pubDate>Wed, 19 Jul 2017 22:00:58 GMT</pubDate>
    <dc:creator>dsun</dc:creator>
    <dc:date>2017-07-19T22:00:58Z</dc:date>
  </channel>
</rss>

