<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to see Mapreduce Spill Disk Activity in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51696#M55834</link>
    <description>Yes. That settings only effect it to put a cap on how large a container can be. It does not mean that your containers will be this size. The yarn.scheduler.minimum-allocation-mb will set the container size if one is not provided by the user.</description>
    <pubDate>Thu, 02 Mar 2017 15:44:19 GMT</pubDate>
    <dc:creator>mbigelow</dc:creator>
    <dc:date>2017-03-02T15:44:19Z</dc:date>
    <item>
      <title>How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51605#M55825</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Just started learning Hadoop, I have no idea about as to how to check if a mapreduce job is making spill or not . if so correct me if i am wrong we have to increase io-sort size , please help me out with this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1 . Also what are all the &amp;nbsp;other parameters that needs to be checked if there is too much spill in mapred-site.xml , hadoop-env.sh files.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Mar 2017 03:57:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51605#M55825</guid>
      <dc:creator>matt123</dc:creator>
      <dc:date>2017-03-01T03:57:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51610#M55826</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/20984"&gt;@matt123&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Go to link http://ipaddress:8088 and check the Cluster Metrics for the RAM, Container, vcore usage&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also Click on "active nodes" to see the same information by node&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cloudera Manager -&amp;gt; HDFS -&amp;gt; Web UI -&amp;gt; Namenode UI -&amp;gt; See the complete metrics&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Mar 2017 05:10:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51610#M55826</guid>
      <dc:creator>saranvisa</dc:creator>
      <dc:date>2017-03-01T05:10:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51620#M55827</link>
      <description>&lt;P&gt;Thanks for the information.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does hadoop metrics are collected by default or should we have to enable it. ? Could you please tell me&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also one more quick clarification if there is too much spill in mapreduce job does it mean we have to increase io-sort mb , if so whats an ideal number should be can i start with 1000.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&amp;nbsp;mapreduce.task.io.sort.mb&lt;/PRE&gt;</description>
      <pubDate>Wed, 01 Mar 2017 07:05:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51620#M55827</guid>
      <dc:creator>matt123</dc:creator>
      <dc:date>2017-03-01T07:05:53Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51668#M55828</link>
      <description>The best indicators are the job counters. Take a look at FILE: Number of bytes written and Spilled Records especially in relation to Map output records. If the spill records are a large portion of the map output records you are spilling a lot.&lt;BR /&gt;&lt;BR /&gt;The first setting below determines how much memory to use for the Map sort and the spill percentage is when it starts spilling to disk as a portion of the first setting. You can tweak both to reduce the amount spilled. The io.sort.mb is a port of the Map heap so there isn't a clear cut "it should be X". You can play around and test it for your job to see how much you can give without slowing down your Mappers from processing data. You could also increase the mapper memory as you increase the io.sort.mb.&lt;BR /&gt;&lt;BR /&gt;mapreduce.task.io.sort.mb&lt;BR /&gt;mapreduce.map.sort.spill.percent</description>
      <pubDate>Wed, 01 Mar 2017 20:33:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51668#M55828</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-03-01T20:33:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51677#M55829</link>
      <description>&lt;P&gt;in mapred-site.xml&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;mapreduce.map.memory.mb = 
&amp;nbsp;
mapreduce.task.io.sort.mb =&lt;/PRE&gt;</description>
      <pubDate>Thu, 02 Mar 2017 05:54:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51677#M55829</guid>
      <dc:creator>csguna</dc:creator>
      <dc:date>2017-03-02T05:54:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51678#M55830</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/18127"&gt;@mbigelow&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;@mbigelow&amp;nbsp; &amp;nbsp;- Could you please clarify this - &amp;nbsp;You could also increase the mapper memory as you increase the io.sort.mb.&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;1 . is it mandatory to increase the mapper memory as we increase io.sort.mb - does it have a &amp;nbsp;dependencies .&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;2. Say if I increase the mapper memory then follow up I have to increase the&amp;nbsp;&lt;BR /&gt;yarn.scheduler.maximum-allocation-mb &amp;nbsp;because of the &amp;nbsp;yarn.nodemanager.vmem-pmem-ratio = 2.1&lt;BR /&gt;yarn.nodemanager.resource.memory.mb = 8192&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;mapreduce.map.java.opts = 2.5GB&lt;BR /&gt;mapreduce.map.memory.mb = 3 gb&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;&lt;STRONG&gt;mapreduce.task.io.sort.mb = 4gb - I can do this .&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;3.&amp;nbsp;yarn.scheduler.maximum-allocation-mb &amp;nbsp; = 8024 -&lt;STRONG&gt; Will i &amp;nbsp;be able to increase the more than 8GB if I have enough Ram in my system.&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;Thanks for the help&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2017 05:58:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51678#M55830</guid>
      <dc:creator>matt123</dc:creator>
      <dc:date>2017-03-02T05:58:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51679#M55831</link>
      <description>&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2017 05:57:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51679#M55831</guid>
      <dc:creator>matt123</dc:creator>
      <dc:date>2017-03-02T05:57:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51680#M55832</link>
      <description>&lt;P&gt;&lt;SPAN&gt;yarn.scheduler.maximum-allocation-mb - This is the max memory that a single container can get&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;yarn.nodemanager.resource.memory-mb &amp;nbsp;- This is how much memory per NM is allocated for containers&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I always set&amp;nbsp;&lt;SPAN&gt;yarn.scheduler.maximum-allocation-mb eqaul to&amp;nbsp;yarn.nodemanager.resource.memory-mb as the single largest container I could do on a host would be the amount of memory on a host allocated for YARN.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;You can set the&amp;nbsp;yarn.scheduler.maximum-allocation-mb to any value, as mention it should not exceed what you set for&amp;nbsp;yarn.nodemanager.resource.memory-mb. &amp;nbsp;If it does, it won't harm anything until someone tries to get a container &amp;gt;&amp;nbsp;yarn.nodemanager.resource.memory-mb.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;You might be able to set the configuration like this:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;mapreduce.task.io.sort.mb = 4gb - I can do this .&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The issue is that the sort buffer is part of the heap of the mapper. &amp;nbsp;For instance a mapper of 3 GB and a heap of 2.5 GB would mean that the sort buffer could quickly fill up the 2.5 GB of heap available. &amp;nbsp;You may not always hit OOM but it is likely due to the poor configuration. &amp;nbsp;In summary&amp;nbsp;yarn.nodemanager.resource.memory-mb &amp;gt;&amp;nbsp;mapreduce.map.memory.mb &amp;gt; mapreduce.task.io.sort. &amp;nbsp;It is not mandatory to increase the&amp;nbsp;mapreduce.map.memory.mb if you increase the sort buffer.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Lets use another example, say you are using 4 GB container with a heap of 3.2 heap. &amp;nbsp;You are spilling a lot of records because you still are using the default sort buffer size. &amp;nbsp;So you increase it to 1 GB. &amp;nbsp;You just shrunk the available memory of your heap from 3.1 GB (3.2 - 100 M, roughly) to 2.2 GB (3.2 - 1). &amp;nbsp;To compensate you could just increase your heap, and along with that your mapper memory. &amp;nbsp;In ths example it would then look like 5 GB container, 4.2 GB heap, and 1 GB sort buffer.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2017 06:09:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51680#M55832</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-03-02T06:09:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51685#M55833</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/18127"&gt;@mbigelow&lt;/a&gt;&lt;/P&gt;&lt;P&gt;Thanks for the explanation with example. its clear.&lt;/P&gt;&lt;P&gt;One last clarification&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The default -&amp;nbsp;yarn.scheduler.maximum-allocation-mb &amp;nbsp; = 8024 -&lt;/SPAN&gt; &lt;U&gt;Will i &amp;nbsp;be able to increase more than 8GB if I have enough Ram in my system.&lt;/U&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2017 09:00:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51685#M55833</guid>
      <dc:creator>matt123</dc:creator>
      <dc:date>2017-03-02T09:00:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51696#M55834</link>
      <description>Yes. That settings only effect it to put a cap on how large a container can be. It does not mean that your containers will be this size. The yarn.scheduler.minimum-allocation-mb will set the container size if one is not provided by the user.</description>
      <pubDate>Thu, 02 Mar 2017 15:44:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51696#M55834</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-03-02T15:44:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51701#M55835</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/18127"&gt;@mbigelow&lt;/a&gt;&amp;nbsp;My English is not that good so I assume from ur answer that I can I set more than 8gb in&amp;nbsp;yarn.scheduler.maximum-allocation-mb &amp;nbsp;please correct me if I am wrong.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2017 16:07:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51701#M55835</guid>
      <dc:creator>matt123</dc:creator>
      <dc:date>2017-03-02T16:07:55Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51734#M55836</link>
      <description>&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/20984"&gt;@matt123&lt;/a&gt; You got it.</description>
      <pubDate>Fri, 03 Mar 2017 02:52:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51734#M55836</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-03-03T02:52:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to see Mapreduce Spill Disk Activity</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51738#M55837</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/18127"&gt;@mbigelow&lt;/a&gt;&amp;nbsp;Cant Thank you engouh Mate&lt;/P&gt;</description>
      <pubDate>Fri, 03 Mar 2017 03:22:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-see-Mapreduce-Spill-Disk-Activity/m-p/51738#M55837</guid>
      <dc:creator>matt123</dc:creator>
      <dc:date>2017-03-03T03:22:32Z</dc:date>
    </item>
  </channel>
</rss>

