Created 03-02-2017 07:23 AM
We're working with Hive 1.3.1, and running an INSERT
statement to upload data into Hive from an external table.
I noticed that the execution plan has changed on the same table from yesterday compared to today.
Yesterday the plan resulted in a M/R job with 341 mappers and 359 reducers, while today the plan resulted in a M/R job with only mappers and no reducers
This is the query:
explain insert OVERWRITE table managed_table PARTITION(col1) select [columns] from external_table
How does hive decide how to execute the query?
How does it translate the insert select into map reduce?
What would cause a plan to change?
This is the first plan (omitting columns lists because the table has over 300 columns)
<code>STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: externalevents Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE Select Operator [columns] outputColumnNames: [columns] Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col394 (type: bigint) sort order: + Map-reduce partition columns: _col394 (type: bigint) Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE value expressions: [columns] Reduce Operator Tree: Select Operator expressions: [columns] outputColumnNames: [columns] Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: ceazip.events_test_hive Stage: Stage-0 Move Operator tables: partition: evtf_first_date_id replace: true table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: ceazip.events_test_hive Stage: Stage-2 Stats-Aggr Operator
<code>And the 2nd plan:
<code>STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: events_2017_02_21_11_13_18 Statistics: Num rows: 468680 Data size: 89772957696 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: [columns] outputColumnNames: [columns] Statistics: Num rows: 468680 Data size: 89772957696 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 468680 Data size: 89772957696 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.events_test1 Stage: Stage-7 Conditional Operator Stage: Stage-4 Move Operator files: hdfs directory: true destination: hdfs://...../apps/hive/warehouse/events_test1/.hive-staging_hive_2017-03-01_07-35-18_776_4958999242494325333-1/-ext-10000 Stage: Stage-0 Move Operator tables: partition: evtf_first_date_id replace: true table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.events_test1 Stage: Stage-2 Stats-Aggr Operator Stage: Stage-3 Merge File Operator Map Operator Tree: ORC File Merge Operator merge level: stripe input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat Stage: Stage-5 Merge File Operator Map Operator Tree: ORC File Merge Operator merge level: stripe input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat Stage: Stage-6 Move Operator files: hdfs directory: true destination: hdfs://isr-r0-aps-nam-1.lab.il.nice.com:8020/apps/hive/warehouse/events_test1/.hive-staging_hive_2017-03-01_07-35-18_776_4958999242494325333-1/-ext-10000
Thanks, Lior
Created 03-07-2017 02:33 AM
CBO (cost based optimizer) and statistics collected on your tables.
You may have the settings mentioned here set to true: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/hive_perf_bes...
As such, the behavior can change over time. You could also force stats on a specific table or even column.
Created 03-07-2017 02:33 AM
CBO (cost based optimizer) and statistics collected on your tables.
You may have the settings mentioned here set to true: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/hive_perf_bes...
As such, the behavior can change over time. You could also force stats on a specific table or even column.