<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How are number of mappers determined for a query with hive on tez? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94919#M8182</link>
    <description>&lt;P&gt;We have created this write-up some time ago, might be useful: &lt;A href="https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works"&gt;https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 08 Oct 2015 20:04:23 GMT</pubDate>
    <dc:creator>andrewg</dc:creator>
    <dc:date>2015-10-08T20:04:23Z</dc:date>
    <item>
      <title>How are number of mappers determined for a query with hive on tez?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94915#M8178</link>
      <description>&lt;P&gt;I am looking into a simple select count(*) query based by avro. If we use mapreduce, I see around 50 mappers spawned for this. If we use tez, I see 367 mappers being used. Overall query time increased with more mappers from 55sec to 105 secs. &lt;/P&gt;&lt;P&gt;What factors are determining the number of mappers? What is the best way to reduce the number of mappers in this case? Could it be related to table being in avro format? &lt;/P&gt;</description>
      <pubDate>Tue, 06 Oct 2015 00:13:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94915#M8178</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2015-10-06T00:13:24Z</dc:date>
    </item>
    <item>
      <title>Re: How are number of mappers determined for a query with hive on tez?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94916#M8179</link>
      <description>&lt;P&gt;The following parameters control the number of mappers for splittable formats with Tez:&lt;/P&gt;&lt;PRE&gt;set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split&lt;/PRE&gt;&lt;P&gt;MapReduce uses the following:&lt;/P&gt;&lt;PRE&gt;set mapreduce.input.fileinputformat.split.minsize=16777216; -- 16 MB
set mapreduce.input.fileinputformat.split.minsize=1073741824; -- 1 GB&lt;/PRE&gt;&lt;P&gt;Increase min and max split size to reduce the number of mappers.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Oct 2015 00:38:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94916#M8179</guid>
      <dc:creator>jpp</dc:creator>
      <dc:date>2015-10-06T00:38:22Z</dc:date>
    </item>
    <item>
      <title>Re: How are number of mappers determined for a query with hive on tez?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94917#M8180</link>
      <description>&lt;P&gt;MRv2 uses CombineInputFormat, while Tez uses grouped splits.

I suspect the table has 367 files, which are not being grouped because the entire cluster has &amp;gt;215 slots - can you confirm the total # of files in the table?

Also, it is a good idea to rebuild the statistics - "analyze table &amp;lt;tbl&amp;gt; compute statistics;"&lt;/P&gt;</description>
      <pubDate>Tue, 06 Oct 2015 01:35:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94917#M8180</guid>
      <dc:creator>gopalv</dc:creator>
      <dc:date>2015-10-06T01:35:56Z</dc:date>
    </item>
    <item>
      <title>Re: How are number of mappers determined for a query with hive on tez?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94918#M8181</link>
      <description>&lt;P&gt;I looked at avro and it has 23 files. Grouping min and max  are default so its 16MB and 1GB. There were 56 blocks on 24 files and a total size of 300MB. It seemed to have 16MB blocks in grouping since the queue is empty. However, using it ran longer with smaller maps than when it ran with 50 mappers. &lt;/P&gt;</description>
      <pubDate>Thu, 08 Oct 2015 05:19:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94918#M8181</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2015-10-08T05:19:15Z</dc:date>
    </item>
    <item>
      <title>Re: How are number of mappers determined for a query with hive on tez?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94919#M8182</link>
      <description>&lt;P&gt;We have created this write-up some time ago, might be useful: &lt;A href="https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works"&gt;https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Oct 2015 20:04:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94919#M8182</guid>
      <dc:creator>andrewg</dc:creator>
      <dc:date>2015-10-08T20:04:23Z</dc:date>
    </item>
    <item>
      <title>Re: How are number of mappers determined for a query with hive on tez?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94920#M8183</link>
      <description>&lt;P&gt;Easiest way to change number of mappers to desired number is:&lt;/P&gt;&lt;P&gt;set tez.grouping.split-count = YOUR-NUMBER-OF-TASKS;&lt;/P&gt;&lt;P&gt;As pointed by Andrew Grande, documented here: &lt;A target="_blank" href="https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works"&gt;https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 15 Nov 2015 09:09:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94920#M8183</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-11-15T09:09:04Z</dc:date>
    </item>
    <item>
      <title>Re: How are number of mappers determined for a query with hive on tez?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94921#M8184</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have set the tez.grouping.split-count = &amp;lt;Number of mapper count&amp;gt; value but still hive not running that number of mapper while executing query.&lt;/P&gt;&lt;P&gt;Is their any other property also i need to set with tez.grouping.split-count property.&lt;/P&gt;</description>
      <pubDate>Mon, 02 May 2016 12:39:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94921#M8184</guid>
      <dc:creator>mandar2174</dc:creator>
      <dc:date>2016-05-02T12:39:54Z</dc:date>
    </item>
  </channel>
</rss>

