<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Fine tune the PIg Job in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Fine-tune-the-PIg-Job/m-p/161621#M41239</link>
    <description>&lt;P&gt;This is a good discussion on setting reducers: &lt;A href="https://community.hortonworks.com/questions/28073/how-do-you-force-the-number-of-reducers-in-a-map-r.html"&gt;https://community.hortonworks.com/questions/28073/how-do-you-force-the-number-of-reducers-in-a-map-r.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;As with all performance tuning, best to isolate a bottleneck and tune that vs. simply trying a lot of things at the same time.  So yes, among other tuning ... set this and see if it works.  If not, move to the next suspected bottleneck.&lt;/P&gt;</description>
    <pubDate>Fri, 23 Sep 2016 19:01:00 GMT</pubDate>
    <dc:creator>gkeys</dc:creator>
    <dc:date>2016-09-23T19:01:00Z</dc:date>
    <item>
      <title>Fine tune the PIg Job</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Fine-tune-the-PIg-Job/m-p/161618#M41236</link>
      <description>&lt;P&gt;I have a pig job that has 6 joins(5 small tables and 1 large table ) in it . The number of Map jobs spawned for the job are 49 and number of reducers is 13 .&lt;/P&gt;&lt;P&gt;The Job is running more than 12 hrs .&lt;/P&gt;&lt;P&gt;Is there any formula to set the below properties &lt;/P&gt;&lt;P&gt;set default_parallel 
set mapred.max.split.size 
set mapred.min.split.size 
set mapred.task.timeout 
set mapred.task.ping.timeout 
set mapred.map.child.java.opts -Xmx4096m;
set mapred.reduce.child.java.opts -Xmx4096m;
set pig.exec.reducers.bytes.per.reducer &lt;/P&gt;&lt;P&gt;i got the above leads for making in faster ..&lt;/P&gt;&lt;P&gt;However i am not able to calculate the exact figures to do it .&lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2016 17:57:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Fine-tune-the-PIg-Job/m-p/161618#M41236</guid>
      <dc:creator>prklearning</dc:creator>
      <dc:date>2016-09-20T17:57:29Z</dc:date>
    </item>
    <item>
      <title>Re: Fine tune the PIg Job</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Fine-tune-the-PIg-Job/m-p/161619#M41237</link>
      <description>&lt;P&gt;You need to take three approaches: &lt;/P&gt;&lt;OL&gt;
&lt;LI&gt; minimize your data before join (e.g. load only columns needed for join and output, filter before join), then&lt;/LI&gt;&lt;LI&gt;optimize your joins, then&lt;/LI&gt;&lt;LI&gt;optimize settings (including compressing intermediate results)&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;For 1, see: &lt;A href="https://pig.apache.org/docs/r0.7.0/cookbook.html" target="_blank"&gt;https://pig.apache.org/docs/r0.7.0/cookbook.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;For 1 and 2, see: &lt;A href="https://pig.apache.org/docs/r0.9.1/perf.html" target="_blank"&gt;https://pig.apache.org/docs/r0.9.1/perf.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;After performing these optimizations, for 3 see: &lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;&lt;A href="http://chimera.labs.oreilly.com/books/1234000001811/ch08.html#pig_tuning" target="_blank"&gt;http://chimera.labs.oreilly.com/books/1234000001811/ch08.html#pig_tuning&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Also, be sure you are running pig on Tez.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2016 19:37:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Fine-tune-the-PIg-Job/m-p/161619#M41237</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-09-20T19:37:55Z</dc:date>
    </item>
    <item>
      <title>Re: Fine tune the PIg Job</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Fine-tune-the-PIg-Job/m-p/161620#M41238</link>
      <description>&lt;P&gt;thank you however ,Is there any way to calculate the appropriate number of reducers for a particular operation.&lt;/P&gt;&lt;P&gt;I observed that increasing the number of reduces might also bring down the performance . in some cases .&lt;/P&gt;</description>
      <pubDate>Fri, 23 Sep 2016 13:08:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Fine-tune-the-PIg-Job/m-p/161620#M41238</guid>
      <dc:creator>prklearning</dc:creator>
      <dc:date>2016-09-23T13:08:59Z</dc:date>
    </item>
    <item>
      <title>Re: Fine tune the PIg Job</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Fine-tune-the-PIg-Job/m-p/161621#M41239</link>
      <description>&lt;P&gt;This is a good discussion on setting reducers: &lt;A href="https://community.hortonworks.com/questions/28073/how-do-you-force-the-number-of-reducers-in-a-map-r.html"&gt;https://community.hortonworks.com/questions/28073/how-do-you-force-the-number-of-reducers-in-a-map-r.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;As with all performance tuning, best to isolate a bottleneck and tune that vs. simply trying a lot of things at the same time.  So yes, among other tuning ... set this and see if it works.  If not, move to the next suspected bottleneck.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Sep 2016 19:01:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Fine-tune-the-PIg-Job/m-p/161621#M41239</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-09-23T19:01:00Z</dc:date>
    </item>
  </channel>
</rss>

