<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: hdfs trash compaction in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109968#M72816</link>
    <description>&lt;P&gt;Just in case you do want to manually clean the trash&lt;/P&gt;&lt;H2&gt;expunge&lt;/H2&gt;&lt;P&gt;Usage: hadoop fs -expunge&lt;/P&gt;&lt;P&gt;Empty the Trash. Refer to the &lt;A href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html"&gt;HDFS Architecture Guide&lt;/A&gt; for more information on the Trash feature.&lt;/P&gt;</description>
    <pubDate>Sat, 23 Apr 2016 06:31:21 GMT</pubDate>
    <dc:creator>Jim_B</dc:creator>
    <dc:date>2016-04-23T06:31:21Z</dc:date>
    <item>
      <title>hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109964#M72812</link>
      <description>&lt;P&gt;Default fs.trash.interval=0 &amp;amp; fs.trash.checkpoint.interval=0 indicating i.e. trash feature is disabled. What is recommended value for Production like clusters ? if these values are 0 then what is command to empty entire hdfs trash directories on periodic basis?&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2016 21:27:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109964#M72812</guid>
      <dc:creator>smayani</dc:creator>
      <dc:date>2016-04-21T21:27:21Z</dc:date>
    </item>
    <item>
      <title>Re: hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109965#M72813</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/220/smayani.html" nodeid="220"&gt;@Saumil Mayani&lt;/A&gt;&lt;P&gt;Default value for "fs.trash.interval" in HDP is 360minutes recommended which is 6hrs.&lt;/P&gt;&lt;P&gt;Also modifying this value again it depends upon priority of the data deleted. From past experience i usually will suggest to keep the value as 1day ie. 1440minute.&lt;/P&gt;&lt;P&gt; fs.trash.checkpoint.interval will be always smaller than "fs.trash.interval".&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2016 21:47:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109965#M72813</guid>
      <dc:creator>sshimpi</dc:creator>
      <dc:date>2016-04-21T21:47:37Z</dc:date>
    </item>
    <item>
      <title>Re: hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109966#M72814</link>
      <description>&lt;P&gt;Hi Saumil, &lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Space_Reclamation" target="_blank"&gt;https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Space_Reclamation&lt;/A&gt;&lt;/P&gt;&lt;P&gt;As the documentations says, to enable trash collection for a certain period, you can set it to a value greater than zero. &lt;/P&gt;&lt;P&gt;The fs.trash.interval can be set to 320 minutes (6 hours)  or 1440 minutes (24 hours) depending on how long you would want to store your trash. The downside of storing more trash would be that the namenode would not be able to reclaim the blocks for the files.&lt;/P&gt;&lt;P&gt;The fs.trash.checkpoint.interval can be set to something smaller than the fs.trash.interval (1 hour or 3 hours). The process which runs based on this interval would basically create new checkpoints and delete any older checkpoints that have expired based on  fs.trash.inteval&lt;/P&gt;&lt;P&gt;Hope this helps..&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2016 22:00:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109966#M72814</guid>
      <dc:creator>vwunnava</dc:creator>
      <dc:date>2016-04-21T22:00:42Z</dc:date>
    </item>
    <item>
      <title>Re: hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109967#M72815</link>
      <description>&lt;P&gt;From past experiences, use this one to be a high number, like atleast a week. While some accidental deletes are identified immediately, there are some cases when we only know about accidental data delete when we are debugging another issue downstream. If your cluster has good free space right now, leave it at a one week or two so you will have enough time to revert back deletes. &lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 00:27:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109967#M72815</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-04-22T00:27:44Z</dc:date>
    </item>
    <item>
      <title>Re: hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109968#M72816</link>
      <description>&lt;P&gt;Just in case you do want to manually clean the trash&lt;/P&gt;&lt;H2&gt;expunge&lt;/H2&gt;&lt;P&gt;Usage: hadoop fs -expunge&lt;/P&gt;&lt;P&gt;Empty the Trash. Refer to the &lt;A href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html"&gt;HDFS Architecture Guide&lt;/A&gt; for more information on the Trash feature.&lt;/P&gt;</description>
      <pubDate>Sat, 23 Apr 2016 06:31:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109968#M72816</guid>
      <dc:creator>Jim_B</dc:creator>
      <dc:date>2016-04-23T06:31:21Z</dc:date>
    </item>
    <item>
      <title>Re: hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109969#M72817</link>
      <description>&lt;P&gt;what happens if
fs.trash.interval=1440
and
 fs.trash.checkpoint.interval=0
does this mean the trash feature is disabled&lt;/P&gt;</description>
      <pubDate>Wed, 04 May 2016 18:09:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109969#M72817</guid>
      <dc:creator>sean_creedon</dc:creator>
      <dc:date>2016-05-04T18:09:53Z</dc:date>
    </item>
    <item>
      <title>Re: hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109970#M72818</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/220/smayani.html" nodeid="220"&gt;@Saumil Mayani&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/216/ravi.html" nodeid="216"&gt;@Ravi Mutyala&lt;/A&gt; Trying to understand fs.trash.checkpoint.interval=0, the default setting. Say, we set fs.trash.interval= &amp;lt;X minutes&amp;gt; and leave fs.trash.checkpoint.interval=0 or not setting fs.trash.checkpoint.interval, how does the trash feature work? Does the trash checkpoint default to trash interval?&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 01:01:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109970#M72818</guid>
      <dc:creator>arramachandran</dc:creator>
      <dc:date>2016-05-06T01:01:56Z</dc:date>
    </item>
    <item>
      <title>Re: hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109971#M72819</link>
      <description>&lt;P&gt;Adding all details from other answer here to consolidate. &lt;/P&gt;&lt;P&gt;Try to keep fs.trash.interval longer (I prefer to keep it as one week).  For fs.trashcheckpoint.interval, this is the interval of the thread that run to clean up all the trash that is longer than the fs.trash.interval. Keep this shorter, like twice a day or more. If you leave it at 0, cleanup happens every 7 days, so there can be some files that can stay for upto 14 days. &lt;/P&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/6983/arramachandran.html" nodeid="6983"&gt;@Arul Ramachandran&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/10232/seancreedon.html" nodeid="10232"&gt;@Sean Creedon&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/220/smayani.html" nodeid="220"&gt;@Saumil Mayani&lt;/A&gt;&lt;/P&gt;&lt;P&gt;if fs.trash.checkpoint.interval &amp;lt; fs.trash.interval or == 0, fs.trash.interval is used as checkpoint interval. So, you can leave it as default 0, as long as you are ok leaving some data for longer in trash. &lt;/P&gt;&lt;P&gt;You can take a look at TrashIntervalDefault.java code that has the details. &lt;/P&gt;&lt;PRE&gt;Emptier(Configuration conf, long emptierInterval) throws IOException {
  this.conf = conf;
  this.emptierInterval = emptierInterval;
  if (emptierInterval &amp;gt; deletionInterval || emptierInterval == 0) {
    LOG.info("The configured checkpoint interval is " +
             (emptierInterval / MSECS_PER_MINUTE) + " minutes." +
             " Using an interval of " +
             (deletionInterval / MSECS_PER_MINUTE) +
             " minutes that is used for deletion instead");
    this.emptierInterval = deletionInterval;
  }&lt;/PRE&gt;</description>
      <pubDate>Fri, 06 May 2016 02:06:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109971#M72819</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-05-06T02:06:43Z</dc:date>
    </item>
    <item>
      <title>Re: hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109972#M72820</link>
      <description>&lt;P&gt;Yes, when fs.trash.checkpoint.interval=0 or not setting fs.trash.checkpoint.interval, fs.trash.interval will be used as checkpoint interval.
&lt;/P&gt;&lt;P&gt;Also, the fs.trash.checkpoint.interval should always be set as smaller than the fs.trash.interval. If it is not, fs.trash.interval will be used as checkpoint interval similar to the case above. &lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2016 04:01:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109972#M72820</guid>
      <dc:creator>xyao</dc:creator>
      <dc:date>2016-06-08T04:01:55Z</dc:date>
    </item>
    <item>
      <title>Re: hdfs trash compaction</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109973#M72821</link>
      <description>&lt;P&gt;For misconfiguration like the cases above, you will find INFO level log like below:&lt;/P&gt;&lt;PRE&gt;"The configured checkpoint interval is 0 minutes. Using an interval of XX (e.g., 60) minutes that is used for deletion instead"&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Jun 2016 04:06:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hdfs-trash-compaction/m-p/109973#M72821</guid>
      <dc:creator>xyao</dc:creator>
      <dc:date>2016-06-08T04:06:57Z</dc:date>
    </item>
  </channel>
</rss>

