<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Hive ACID Table count query launching too many mappers in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-Table-count-query-launching-too-many-mappers/m-p/177604#M70421</link>
    <description>&lt;P&gt;I have loaded just around 214MB worth of data with the year, month, day and hour wise partitioned Hive ACID table through a merge query, however when I launch a simple count(*) query its taking &lt;STRONG&gt;3645&lt;/STRONG&gt; mappers, which the same data on a non-transactional table on Hive just takes &lt;STRONG&gt;12&lt;/STRONG&gt; mappers. Is this an expected behavior? I followed the steps from &lt;A href="https://hortonworks.com/tutorial/using-hive-acid-transactions-to-insert-update-and-delete-data/"&gt; here&lt;/A&gt;. &lt;/P&gt;</description>
    <pubDate>Sun, 29 Oct 2017 22:45:23 GMT</pubDate>
    <dc:creator>sushil416</dc:creator>
    <dc:date>2017-10-29T22:45:23Z</dc:date>
    <item>
      <title>Hive ACID Table count query launching too many mappers</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-Table-count-query-launching-too-many-mappers/m-p/177604#M70421</link>
      <description>&lt;P&gt;I have loaded just around 214MB worth of data with the year, month, day and hour wise partitioned Hive ACID table through a merge query, however when I launch a simple count(*) query its taking &lt;STRONG&gt;3645&lt;/STRONG&gt; mappers, which the same data on a non-transactional table on Hive just takes &lt;STRONG&gt;12&lt;/STRONG&gt; mappers. Is this an expected behavior? I followed the steps from &lt;A href="https://hortonworks.com/tutorial/using-hive-acid-transactions-to-insert-update-and-delete-data/"&gt; here&lt;/A&gt;. &lt;/P&gt;</description>
      <pubDate>Sun, 29 Oct 2017 22:45:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-Table-count-query-launching-too-many-mappers/m-p/177604#M70421</guid>
      <dc:creator>sushil416</dc:creator>
      <dc:date>2017-10-29T22:45:23Z</dc:date>
    </item>
    <item>
      <title>Re: Hive ACID Table count query launching too many mappers</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-Table-count-query-launching-too-many-mappers/m-p/177605#M70422</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/44595/sushil416.html" nodeid="44595"&gt;@Sushil Ks&lt;/A&gt;&lt;P&gt;Yes, that's expected because if you are having &lt;STRONG&gt;ACID properties&lt;/STRONG&gt; enabled on the table, then there will be lot of delta files(3645) in HDFS directory.&lt;/P&gt;&lt;P&gt;you can check files by using &lt;/P&gt;&lt;PRE&gt;bash# hadoop fs -count -v -t &amp;lt;table-location&amp;gt;&lt;/PRE&gt;&lt;P&gt;Each &lt;STRONG&gt;mapper&lt;/STRONG&gt; gets will&lt;STRONG&gt; load 1 file&lt;/STRONG&gt; so that is the reason why there are &lt;STRONG&gt;3645 mappers are launched.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;If there are&lt;STRONG&gt; lot of delta files&lt;/STRONG&gt; in the directory you need to run &lt;STRONG&gt;Major or minor compactions,&lt;/STRONG&gt; to &lt;STRONG&gt;reduce number of mappers &lt;/STRONG&gt;are&lt;STRONG&gt; launched. &lt;/STRONG&gt;&lt;B&gt;T&lt;/B&gt;hese compactions takes a set of existing delta files and rewrites them to a single delta file&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Types of Compactions in hive:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1.Minor Compaction:-&lt;/STRONG&gt;A ‘minor’ compaction will takes&lt;STRONG&gt; all the delta files and rewrites them to single delta file&lt;/STRONG&gt;. This compaction wont take much resources.&lt;/P&gt;&lt;PRE&gt;hive#alter table &amp;lt;table-name&amp;gt; partition(&amp;lt;partition-name&amp;gt;,&amp;lt;nested-partition-name&amp;gt;,..) compact 'minor';&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Example:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Here &lt;STRONG&gt;par_buk&lt;/STRONG&gt; is the &lt;STRONG&gt;table name&lt;/STRONG&gt; having&lt;STRONG&gt; dat &lt;/STRONG&gt;is the &lt;STRONG&gt;partition column&lt;/STRONG&gt; into &lt;STRONG&gt;10 buckets &lt;/STRONG&gt;and having 1 base file and 3 delta files.&lt;/P&gt;&lt;PRE&gt;bash# hadoop fs -ls /apps/hive/warehouse/par_buk/dat=2017-10-09_12/
Found 4 items
drwxrwxrwx   -  hdfs          0 2017-10-29 14:14 /apps/hive/warehouse/par_buk/dat=2017-10-09_12/base_314724388
drwxr-xr-x   -  hdfs          0 2017-10-29 14:19 /apps/hive/warehouse/par_buk/dat=2017-10-09_12/delta_314724389_314724389
drwxr-xr-x   -  hdfs          0 2017-10-29 14:19 /apps/hive/warehouse/par_buk/dat=2017-10-09_12/delta_314724390_314724390
drwxr-xr-x   -  hdfs          0 2017-10-29 14:19 /apps/hive/warehouse/par_buk/dat=2017-10-09_12/delta_314724391_31472439&lt;BR /&gt;&lt;/PRE&gt;&lt;PRE&gt;hive# alter table par_buk partition(dat='2017-10-09_12') compact 'minor'; //minor compaction gets all the delta files and rewrites them to single delta file&lt;/PRE&gt;&lt;PRE&gt;bash# hadoop fs -ls /apps/hive/warehouse/par_buk/dat=2017-10-09_12/
Found 2 items
drwxrwxrwx   -  hdfs          0 2017-10-29 14:14 /apps/hive/warehouse/par_buk/dat=2017-10-09_12/base_314724388
drwxrwxrwx   -  hdfs          0 2017-10-29 14:20 /apps/hive/warehouse/par_buk/dat=2017-10-09_12/delta_314724389_314724391&lt;/PRE&gt;&lt;P&gt;As you can see all &lt;STRONG&gt;delta files are rewritten to single delta file in minor compaction&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2.Major Compaction:-&lt;/STRONG&gt;A ‘major’ compaction will takes &lt;STRONG&gt;one or more delta files&lt;/STRONG&gt;(same as minor compaction) and the &lt;STRONG&gt;base file&lt;/STRONG&gt; for the bucket and rewrites them into a &lt;STRONG&gt;new base file per bucket&lt;/STRONG&gt;.  Major compaction is more expensive but is more effective. &lt;/P&gt;&lt;P&gt;This compaction can take minutes to hours and can consume a lot of disk, network, memory and CPU resources, so they should be invoked carefully.&lt;/P&gt;&lt;PRE&gt;hive# alter table &amp;lt;table-name&amp;gt; partition(&amp;lt;partition-name&amp;gt;,&amp;lt;nested-partition-name&amp;gt;,..) compact 'major';&lt;/PRE&gt;&lt;P&gt;Example:-&lt;/P&gt;&lt;PRE&gt;bash# hadoop fs -ls /apps/hive/warehouse/par_buk/dat=2017-10-09_12/
Found 2 items
drwxrwxrwx   -  hdfs          0 2017-10-29 14:14 /apps/hive/warehouse/par_buk/dat=2017-10-09_12/base_314724388
drwxrwxrwx   -  hdfs          0 2017-10-29 14:20 /apps/hive/warehouse/par_buk/dat=2017-10-09_12/delta_314724389_314724391&lt;/PRE&gt;&lt;PRE&gt;hive# alter table par_buk partition(dat='2017-10-09_12') compact 'major'; //major compaction gets all the delta files, base files and rewrites them to single new base file.&lt;/PRE&gt;&lt;PRE&gt;bash# hadoop fs -ls /apps/hive/warehouse/par_buk/dat=2017-10-09_12/
Found 1 items
drwxrwxrwx   -  hdfs          0 2017-10-29 14:34 /apps/hive/warehouse/par_buk/dat=2017-10-09_12/base_314724391&lt;/PRE&gt;&lt;P&gt;As you can see major compaction has rewritten base file and delta file to new base file per bucket.&lt;/P&gt;&lt;P&gt;If you want to see the status of compactions you can use&lt;/P&gt;&lt;PRE&gt;hive# show compactions;&lt;/PRE&gt;&lt;P&gt;So once you run &lt;STRONG&gt;Compactions&lt;/STRONG&gt;  all delta files are rewritten to single file, then there will be less number of mappers are launched. These Compactions &lt;B&gt;helps&lt;/B&gt; you to &lt;B&gt;significantly&lt;/B&gt; &lt;B&gt;increase query performance.&lt;/B&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Oct 2017 01:53:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-Table-count-query-launching-too-many-mappers/m-p/177605#M70422</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2017-10-30T01:53:33Z</dc:date>
    </item>
    <item>
      <title>Re: Hive ACID Table count query launching too many mappers</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-Table-count-query-launching-too-many-mappers/m-p/177606#M70423</link>
      <description>&lt;P&gt;Thanks a lot for your time.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Oct 2017 14:02:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-Table-count-query-launching-too-many-mappers/m-p/177606#M70423</guid>
      <dc:creator>sushil416</dc:creator>
      <dc:date>2017-10-30T14:02:45Z</dc:date>
    </item>
    <item>
      <title>Re: Hive ACID Table count query launching too many mappers</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-Table-count-query-launching-too-many-mappers/m-p/177607#M70424</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;BR /&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Wonderful explaination !!!.&lt;BR /&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 30 Oct 2017 14:04:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-Table-count-query-launching-too-many-mappers/m-p/177607#M70424</guid>
      <dc:creator>jsensharma</dc:creator>
      <dc:date>2017-10-30T14:04:53Z</dc:date>
    </item>
  </channel>
</rss>

