<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How does hive decide on the insert query plan in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-hive-decide-on-the-insert-query-plan/m-p/133466#M55960</link>
    <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/16367/liorhadaya.html"&gt;Lior Hadaya&lt;/A&gt;&lt;/P&gt;&lt;P&gt;CBO (cost based optimizer) and statistics collected on your tables.&lt;/P&gt;&lt;P&gt;You may have the settings mentioned here set to true: &lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/hive_perf_best_pract_use_col_stats_cost_base_opt.html" target="_blank"&gt;https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/hive_perf_best_pract_use_col_stats_cost_base_opt.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;As such, the behavior can change over time. You could also force stats on a specific table or even column.&lt;/P&gt;</description>
    <pubDate>Tue, 07 Mar 2017 10:33:44 GMT</pubDate>
    <dc:creator>cstanca</dc:creator>
    <dc:date>2017-03-07T10:33:44Z</dc:date>
    <item>
      <title>How does hive decide on the insert query plan</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-hive-decide-on-the-insert-query-plan/m-p/133465#M55959</link>
      <description>&lt;P&gt;We're working with Hive 1.3.1, and running an &lt;CODE&gt;INSERT&lt;/CODE&gt; statement to upload data into Hive from an external table.&lt;/P&gt;&lt;P&gt;I noticed that the execution plan has changed on the same table from yesterday compared to today. &lt;/P&gt;&lt;P&gt;Yesterday the plan resulted in a M/R job with 341 mappers and 359 reducers, while today the plan resulted in a M/R job with only mappers and no reducers&lt;/P&gt;&lt;P&gt;This is the query: &lt;/P&gt;&lt;PRE&gt;explain insert OVERWRITE table managed_table PARTITION(col1) select [columns] from external_table
&lt;/PRE&gt;&lt;P&gt;How does hive decide how to execute the query?&lt;/P&gt;&lt;P&gt;How does it translate the insert select into map reduce? &lt;/P&gt;&lt;P&gt;What would cause a plan to change? &lt;/P&gt;&lt;P&gt;This is the first plan (omitting columns lists because the table has over 300 columns)&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;STAGE DEPENDENCIES:

  Stage-1 is a root stage

  Stage-0 depends on stages: Stage-1

  Stage-2 depends on stages: Stage-0

STAGE PLANS:

  Stage: Stage-1

    Map Reduce

      Map Operator Tree:

          TableScan
            alias: externalevents
            Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
            Select Operator  [columns] outputColumnNames:  [columns]  Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col394 (type: bigint)
                sort order: +
                Map-reduce partition columns: _col394 (type: bigint)
                Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
                value expressions: [columns]  Reduce Operator Tree:
        Select Operator
          expressions: [columns]  outputColumnNames: [columns] Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                name: ceazip.events_test_hive

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            evtf_first_date_id
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: ceazip.events_test_hive

  Stage: Stage-2
    Stats-Aggr Operator&lt;/PRE&gt;

&lt;PRE&gt;&amp;lt;code&amp;gt;And the 2nd plan:&lt;/PRE&gt;

&lt;PRE&gt;&amp;lt;code&amp;gt;STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: events_2017_02_21_11_13_18
            Statistics: Num rows: 468680 Data size: 89772957696 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: [columns] 
outputColumnNames: [columns] 
 Statistics: Num rows: 468680 Data size: 89772957696 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 468680 Data size: 89772957696 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: default.events_test1
  Stage: Stage-7
    Conditional Operator
  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://...../apps/hive/warehouse/events_test1/.hive-staging_hive_2017-03-01_07-35-18_776_4958999242494325333-1/-ext-10000
  Stage: Stage-0
    Move Operator
      tables:
          partition:
            evtf_first_date_id
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: default.events_test1
  Stage: Stage-2
    Stats-Aggr Operator
  Stage: Stage-3
    Merge File Operator
      Map Operator Tree:
          ORC File Merge Operator
      merge level: stripe
      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
  Stage: Stage-5
    Merge File Operator
      Map Operator Tree:
          ORC File Merge Operator
      merge level: stripe
      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://isr-r0-aps-nam-1.lab.il.nice.com:8020/apps/hive/warehouse/events_test1/.hive-staging_hive_2017-03-01_07-35-18_776_4958999242494325333-1/-ext-10000
&lt;/PRE&gt;
&lt;PRE&gt;Thanks,
Lior&lt;/PRE&gt;</description>
      <pubDate>Thu, 02 Mar 2017 15:23:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-hive-decide-on-the-insert-query-plan/m-p/133465#M55959</guid>
      <dc:creator>LH</dc:creator>
      <dc:date>2017-03-02T15:23:54Z</dc:date>
    </item>
    <item>
      <title>Re: How does hive decide on the insert query plan</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-hive-decide-on-the-insert-query-plan/m-p/133466#M55960</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/16367/liorhadaya.html"&gt;Lior Hadaya&lt;/A&gt;&lt;/P&gt;&lt;P&gt;CBO (cost based optimizer) and statistics collected on your tables.&lt;/P&gt;&lt;P&gt;You may have the settings mentioned here set to true: &lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/hive_perf_best_pract_use_col_stats_cost_base_opt.html" target="_blank"&gt;https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/hive_perf_best_pract_use_col_stats_cost_base_opt.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;As such, the behavior can change over time. You could also force stats on a specific table or even column.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2017 10:33:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-hive-decide-on-the-insert-query-plan/m-p/133466#M55960</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2017-03-07T10:33:44Z</dc:date>
    </item>
  </channel>
</rss>

