<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question disk space issue on local disk.. due to buffering of s3 data in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/disk-space-issue-on-local-disk-due-to-buffering-of-s3-data/m-p/34357#M10968</link>
    <description>&lt;P&gt;I am running EC2 cluster with s3 . Here when I run any hive query or some hadoop command that operates on very big data, it copies tmp files on the local disk on the nodes before/after copying them to/from s3. I know it can be configured with 'fs.s3.buffer.dir' property. Ideally it should delete and it does, but in some cases it does not delete those files, resulting in accumulation of a lot of .tmp files(in GBs) on all the nodes.. resulting in space issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there anyway that we can avoid the .tmp files creation?&lt;/P&gt;&lt;P&gt;Or somehow if we can identify why in some cases it does not delete those .tmp files and correct it?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please suggest what can be the best solution in this case.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 24 Nov 2015 05:55:46 GMT</pubDate>
    <dc:creator>LovekeshBansal</dc:creator>
    <dc:date>2015-11-24T05:55:46Z</dc:date>
    <item>
      <title>disk space issue on local disk.. due to buffering of s3 data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/disk-space-issue-on-local-disk-due-to-buffering-of-s3-data/m-p/34357#M10968</link>
      <description>&lt;P&gt;I am running EC2 cluster with s3 . Here when I run any hive query or some hadoop command that operates on very big data, it copies tmp files on the local disk on the nodes before/after copying them to/from s3. I know it can be configured with 'fs.s3.buffer.dir' property. Ideally it should delete and it does, but in some cases it does not delete those files, resulting in accumulation of a lot of .tmp files(in GBs) on all the nodes.. resulting in space issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there anyway that we can avoid the .tmp files creation?&lt;/P&gt;&lt;P&gt;Or somehow if we can identify why in some cases it does not delete those .tmp files and correct it?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please suggest what can be the best solution in this case.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Nov 2015 05:55:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/disk-space-issue-on-local-disk-due-to-buffering-of-s3-data/m-p/34357#M10968</guid>
      <dc:creator>LovekeshBansal</dc:creator>
      <dc:date>2015-11-24T05:55:46Z</dc:date>
    </item>
    <item>
      <title>Re: disk space issue on local disk.. due to buffering of s3 data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/disk-space-issue-on-local-disk-due-to-buffering-of-s3-data/m-p/34844#M10969</link>
      <description>If the JVM that's buffering in the local dir were to die of a SIGKILL or such forms of immediate interruption, then the cleanup procedures aren't taken care of.&lt;BR /&gt;&lt;BR /&gt;When running in MR mode, try setting the buffer directory to ./tmp (relative) such that it creates the files under the task's working directories and these can be deleted automatically when the TaskTracker/NodeManager cleans up the tasks' environment after its kill.&lt;BR /&gt;&lt;BR /&gt;Also, have you tried to use S3A (s3a://) instead? It may function better than the older S3 FS, and does not utilise a buffer directory. S3A is included in CDH5 for a while now.</description>
      <pubDate>Mon, 07 Dec 2015 06:00:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/disk-space-issue-on-local-disk-due-to-buffering-of-s3-data/m-p/34844#M10969</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2015-12-07T06:00:44Z</dc:date>
    </item>
    <item>
      <title>Re: disk space issue on local disk.. due to buffering of s3 data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/disk-space-issue-on-local-disk-due-to-buffering-of-s3-data/m-p/34894#M10970</link>
      <description>&lt;P&gt;Thanks For&amp;nbsp;such an informative reply. I have already implemented s3a:// and&amp;nbsp;yes&amp;nbsp;only is the solution.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The other one,i.e. changing to /tmp dir is an intelligent workaround.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Dec 2015 05:19:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/disk-space-issue-on-local-disk-due-to-buffering-of-s3-data/m-p/34894#M10970</guid>
      <dc:creator>LovekeshBansal</dc:creator>
      <dc:date>2015-12-08T05:19:04Z</dc:date>
    </item>
  </channel>
</rss>

