<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to limit jobcache foledr size? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2659#M393</link>
    <description>&lt;P&gt;Hey Markovich,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Within CM, there are several tunible properties for all the various modules that are not common enough to have options in CM. &amp;nbsp;To handle those options, you can add them as a safety valve. &amp;nbsp;To do this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Go to Services-&amp;gt;MapReduce-&amp;gt;Configuration(View and Edit).&lt;/P&gt;&lt;P&gt;- Then expand Service-Wide and click on Advanced.&amp;nbsp;&lt;/P&gt;&lt;P&gt;- There you should see "&lt;SPAN&gt;MapReduce Service Configuration Safety Valve for mapred-site.xml". &amp;nbsp;Paste the following in there based on the value you want to set for the number of cache directories:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;lt;property&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&amp;lt;name&amp;gt;mapreduce.tasktracker.local.cache.numberdirectories&amp;lt;/name&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&amp;lt;value&amp;gt;5000&amp;lt;/value&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;lt;/property&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;- Then save the config and restart the Mapreduce service.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;This is true for all the various modules. &amp;nbsp;If you don't find the value when you search, it's probably not settable, but every module will have an Advanced with a "Safety Valve", so you can put your properties in there when necessary.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Hope this helps.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Chris&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 27 Oct 2013 15:36:09 GMT</pubDate>
    <dc:creator>cconner</dc:creator>
    <dc:date>2013-10-27T15:36:09Z</dc:date>
    <item>
      <title>How to limit jobcache foledr size?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2361#M390</link>
      <description>&lt;P&gt;Hello hadoop experts, I have new problem and new question for you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a lot of (&amp;nbsp;approximately 70,000&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;)small files, a total of 40 GB.&lt;/P&gt;&lt;P&gt;I've developed Map only java program to analysis this files. I have no reducers and have no output. Only counters.&lt;BR /&gt;&lt;SPAN&gt;I've a single node setup of CDH 4.4.0.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I've got hard disk with 150GB. And all Hadoop&amp;nbsp;environment uses it (logs,libs,hdfs data files and so on).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So after I put all the files in HDFS, I was left with a little less than 100GB of free space.&lt;/P&gt;&lt;P&gt;I have successfully started my job, and 10 hours later Hadoop fell down due to the fact that it hasn't enough space to write the log.&lt;/P&gt;&lt;P&gt;I looked on my disk and found than folder&amp;nbsp;&lt;STRONG&gt;mapred/local/taskTracker/hdfs/jobcache/job_xxxx_xxx&amp;nbsp;&lt;/STRONG&gt;occupies all available space.&lt;/P&gt;&lt;P&gt;Hadoop has processed approximately 10,000 of files,&amp;nbsp;so that in folder was approximately 10,000 subfolders&amp;nbsp;each of which contains only one file job.xml (weight 8MB). So 8mb * 10,000 ~&lt;STRONG&gt; 78 Gb.&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;And here is my question: How can I process 70,000 of files?&lt;/P&gt;&lt;P&gt;(I will need&amp;nbsp;&lt;SPAN&gt;approximately 550GB of free space to process 40 GB of small files!)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Is it possible to configure Hadoop to cleanup after every map?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; Markovich&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 08:49:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2361#M390</guid>
      <dc:creator>Markovich</dc:creator>
      <dc:date>2022-09-16T08:49:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to limit jobcache foledr size?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2627#M391</link>
      <description>&lt;P&gt;Hey Markovich,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The jobcache by default will max out at 10000 directories. &amp;nbsp;So you should not go above the ~80gb mark there. &amp;nbsp;However, this is configurable and it seems like in your case maybe 5000 directories or even 1000 may be enough. &amp;nbsp;You can set:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;mapreduce.tasktracker.local.cache.numberdirectories&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;To a lower value and see if that helps.&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;Hope this helps.&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;Thanks&lt;/P&gt;&lt;P class="p1"&gt;Chris&lt;/P&gt;</description>
      <pubDate>Fri, 25 Oct 2013 14:02:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2627#M391</guid>
      <dc:creator>cconner</dc:creator>
      <dc:date>2013-10-25T14:02:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to limit jobcache foledr size?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2641#M392</link>
      <description>&lt;P&gt;Hey Chris,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;I went to Services&amp;gt;MapReduce&amp;gt;Configuration(View and Edit) and past in search &lt;STRONG&gt;mapreduce.tasktracker.local.cache.numberdirectories.&lt;/STRONG&gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;I found nothing.&lt;/P&gt;&lt;P class="p1"&gt;Also I typed only &lt;STRONG&gt;cache&lt;/STRONG&gt; and only &lt;STRONG&gt;local&lt;/STRONG&gt;, and also nothing.&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;Thanks&lt;/P&gt;&lt;P class="p1"&gt;Markovich&lt;/P&gt;</description>
      <pubDate>Sat, 26 Oct 2013 07:29:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2641#M392</guid>
      <dc:creator>Markovich</dc:creator>
      <dc:date>2013-10-26T07:29:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to limit jobcache foledr size?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2659#M393</link>
      <description>&lt;P&gt;Hey Markovich,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Within CM, there are several tunible properties for all the various modules that are not common enough to have options in CM. &amp;nbsp;To handle those options, you can add them as a safety valve. &amp;nbsp;To do this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Go to Services-&amp;gt;MapReduce-&amp;gt;Configuration(View and Edit).&lt;/P&gt;&lt;P&gt;- Then expand Service-Wide and click on Advanced.&amp;nbsp;&lt;/P&gt;&lt;P&gt;- There you should see "&lt;SPAN&gt;MapReduce Service Configuration Safety Valve for mapred-site.xml". &amp;nbsp;Paste the following in there based on the value you want to set for the number of cache directories:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;lt;property&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&amp;lt;name&amp;gt;mapreduce.tasktracker.local.cache.numberdirectories&amp;lt;/name&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&amp;lt;value&amp;gt;5000&amp;lt;/value&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;lt;/property&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;- Then save the config and restart the Mapreduce service.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;This is true for all the various modules. &amp;nbsp;If you don't find the value when you search, it's probably not settable, but every module will have an Advanced with a "Safety Valve", so you can put your properties in there when necessary.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Hope this helps.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Chris&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Oct 2013 15:36:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2659#M393</guid>
      <dc:creator>cconner</dc:creator>
      <dc:date>2013-10-27T15:36:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to limit jobcache foledr size?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2669#M394</link>
      <description>&lt;P&gt;Hey Chris,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you so much, it helped me!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Andrey&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2013 06:45:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2669#M394</guid>
      <dc:creator>Markovich</dc:creator>
      <dc:date>2013-10-28T06:45:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to limit jobcache foledr size?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2689#M395</link>
      <description>&lt;P&gt;Hey Chris,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm sorry, I was wrong. It did not help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have set&amp;nbsp;&lt;STRONG&gt;MapReduce Service Configuration Safety Valve for mapred-site.xml to&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapreduce.tasktracker.local.cache.numberdirectories&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;2000&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And now I see that I have no space on disk.&lt;/P&gt;&lt;P&gt;In folder&amp;nbsp;&lt;STRONG&gt;mapred/local/taskTracker/hdfs/jobcache/job_201310281050_0001&amp;nbsp;&lt;/STRONG&gt;there are more then 6000 files. (On the other host there are more then 5000).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;P.S. I cheked job.xml and there are value&amp;nbsp;&lt;SPAN&gt;mapreduce.tasktracker.local.cache.numberdirectories and it is set to 2000.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 16px; line-height: 20px;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Andrey&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2013 15:36:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2689#M395</guid>
      <dc:creator>Markovich</dc:creator>
      <dc:date>2013-10-28T15:36:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to limit jobcache foledr size?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2697#M396</link>
      <description>&lt;P&gt;Could it have something to do with old cache files not being cleaned out from before when you made the change? &amp;nbsp;I think there is a mechanism for retiring these old files and moving them off/deleting them, but I'm not positive if that applies to the actual jobcache files.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Maybe this blog contains the clue?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://blog.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/" target="_blank"&gt;http://blog.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2013 16:23:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2697#M396</guid>
      <dc:creator>Clint</dc:creator>
      <dc:date>2013-10-28T16:23:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to limit jobcache foledr size?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2701#M397</link>
      <description>&lt;P&gt;No, I stoped all jobes before changing configuration in MapReduce service, and restarted all cluster.&lt;/P&gt;&lt;P&gt;I also checked folder&amp;nbsp;&lt;STRONG&gt;mapred/local/taskTracker/hdfs/jobcache&lt;/STRONG&gt;&amp;nbsp;and i am sure it was empty.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for the link, but I found nothing about&amp;nbsp;&lt;STRONG&gt;jobcache&amp;nbsp;&lt;/STRONG&gt;folder.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, after the job failes or completed job folder are deleted from&amp;nbsp;&lt;STRONG&gt;jobcache&amp;nbsp;&lt;/STRONG&gt;folder whith all&lt;STRONG&gt; attemp_xxxx&lt;/STRONG&gt; folders.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2013 17:44:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-limit-jobcache-foledr-size/m-p/2701#M397</guid>
      <dc:creator>Markovich</dc:creator>
      <dc:date>2013-10-28T17:44:46Z</dc:date>
    </item>
  </channel>
</rss>

