<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Why  'count()' query of  hive external table  has different performance compared to   internal table ? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398202#M250133</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/122043"&gt;@forhive&lt;/a&gt;&amp;nbsp;Welcome to the Cloudera Community!&lt;BR /&gt;&lt;BR /&gt;To help you get the best possible solution, I have tagged our Hive experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92016"&gt;@ggangadharan&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/45798"&gt;@james_jones&lt;/a&gt;&amp;nbsp;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please keep us updated on your post, and we hope you find a satisfactory solution to your query.&lt;/P&gt;</description>
    <pubDate>Thu, 28 Nov 2024 17:47:37 GMT</pubDate>
    <dc:creator>DianaTorres</dc:creator>
    <dc:date>2024-11-28T17:47:37Z</dc:date>
    <item>
      <title>Why  'count()' query of  hive external table  has different performance compared to   internal table ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398200#M250131</link>
      <description>&lt;P&gt;&amp;nbsp;I have create two partitioned tables&amp;nbsp; with same ddl,&amp;nbsp; except for that one is&amp;nbsp;internal table&amp;nbsp; , another is&amp;nbsp;external table 。Both has same file location (oss://xxxx/xxx )。&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;create table test_analyze( id int, name varchar(20)) partitioned by (dt string) ; 
create table test_analyze_external( id int, name varchar(20)) partitioned by (dt string) ;


msck repair table test_analyze;
analyze table  test_analyze partition(dt='20241121') compute statistics; 

msck repair table test_analyze_external;
analyze table  test_analyze_external partition(dt='20241121') compute statistics; &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;when I query&amp;nbsp; count() for table 'test_analyze'， then quickly get&amp;nbsp; the result ,&amp;nbsp; it seems fetch result from stats。 but when I send the same query to table 'test_analyze_external',&amp;nbsp; it can't return stats result , and active a MR job。And&amp;nbsp; it's&amp;nbsp; sql-explain below：&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| STAGE DEPENDENCIES:                                |
|   Stage-1 is a root stage                          |
|   Stage-0 depends on stages: Stage-1               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-1                                   |
|     Map Reduce                                     |
|       Map Operator Tree:                           |
|           TableScan                                |
|             alias: test_analyze_external           |
|             Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE |
|             Select Operator                        |
|               Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE |
|               Group By Operator                    |
|                 aggregations: count()              |
|                 mode: hash                         |
|                 outputColumnNames: _col0           |
|                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
|                 File Output Operator               |
|                   compressed: false                |
|                   table:                           |
|                       input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
|                       output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
|                       serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe |
|                                                    |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         Group By Operator                          |
|           aggregations: count(_col0)               |
|           mode: mergepartial                       |
|           outputColumnNames: _col0                 |
|           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
|           ListSink                                 |
|                                                    |&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And&amp;nbsp; I'm sure&amp;nbsp; of having the config " set hive.compute.query.using.stats = true&amp;nbsp; "，so, why???&lt;/P&gt;&lt;P&gt;Who can explain？&lt;/P&gt;</description>
      <pubDate>Thu, 28 Nov 2024 17:13:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398200#M250131</guid>
      <dc:creator>forhive</dc:creator>
      <dc:date>2024-11-28T17:13:37Z</dc:date>
    </item>
    <item>
      <title>Re: Why  'count()' query of  hive external table  has different performance compared to   internal table ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398202#M250133</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/122043"&gt;@forhive&lt;/a&gt;&amp;nbsp;Welcome to the Cloudera Community!&lt;BR /&gt;&lt;BR /&gt;To help you get the best possible solution, I have tagged our Hive experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92016"&gt;@ggangadharan&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/45798"&gt;@james_jones&lt;/a&gt;&amp;nbsp;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please keep us updated on your post, and we hope you find a satisfactory solution to your query.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Nov 2024 17:47:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398202#M250133</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2024-11-28T17:47:37Z</dc:date>
    </item>
    <item>
      <title>Re: Why  'count()' query of  hive external table  has different performance compared to   internal table ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398222#M250136</link>
      <description>&lt;UL&gt;&lt;LI&gt;First of all ,&amp;nbsp;It is not recommended to use the same location for both internal and external tables.&lt;/LI&gt;&lt;LI&gt;Internal tables in Hive are native tables that are fully controlled by Hive itself.&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;External tables, on the other hand, can be accessed by other components such as Spark, Impala, and File system operations,.....etc.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Since External tables are used by other components, their corresponding locations need to be relied upon.&lt;/LI&gt;&lt;LI&gt;To read the files and obtain the count, Hive launches a MapReduce job for external tables.&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;It is recommended to use Managed tables if other components are not utilizing the corresponding table.&lt;/SPAN&gt;&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Fri, 29 Nov 2024 07:47:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398222#M250136</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-11-29T07:47:10Z</dc:date>
    </item>
    <item>
      <title>Re: Why  'count()' query of  hive external table  has different performance compared to   internal table ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398256#M250151</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Regarding the fact that both external tables and internal tables use the same file location,&amp;nbsp; and please forgive my improper usage.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;But I'm just testing. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The part that confuses me is that after using the same &lt;/SPAN&gt;MSCK REPAIR TABLE&lt;SPAN&gt; + &lt;/SPAN&gt;ANALYZE TABLE&lt;SPAN&gt; commands on both tables, when I run the same &lt;/SPAN&gt;COUNT()&lt;SPAN&gt; query, why does Hive apply query optimization for one but not for the other? What is the underlying principle behind Hive's internal design that leads to this distinction?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Nov 2024 14:51:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398256#M250151</guid>
      <dc:creator>forhive</dc:creator>
      <dc:date>2024-11-29T14:51:21Z</dc:date>
    </item>
    <item>
      <title>Re: Why  'count()' query of  hive external table  has different performance compared to   internal table ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398257#M250152</link>
      <description>&lt;P&gt;many thanks&amp;nbsp; !&lt;/P&gt;</description>
      <pubDate>Fri, 29 Nov 2024 14:51:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Why-count-query-of-hive-external-table-has-different/m-p/398257#M250152</guid>
      <dc:creator>forhive</dc:creator>
      <dc:date>2024-11-29T14:51:58Z</dc:date>
    </item>
  </channel>
</rss>

