Created on 12-27-2023 09:48 AM - edited 12-27-2023 09:51 AM
Hive 3.0 introduced an option to re-attempt a failed Hive query, in case the first run fails. It would only make sense if we fixed whatever was the issue in the previous run. We'll discuss the ways to configure this once without having to intervene after each failure event.
The following Hive property enables query re-execution. This should be enabled out of the box.
hive.query.reexecution.enabled=true;
hive.query.reexecution.strategies=overlay,reoptimize,recompile_without_cbo,reexecute_lost_am;
Using this method, we can set a Hive property that should be applied on the re-execution. It works by adding a configuration subtree as an overlay to the actual hive settings(reexec.overlay.*).
set reexec.overlay.{hive_property}=new_value
Every hive setting which has a prefix of "reexec.overlay" will be set for all re-executions.
e.g.
In case our Hive queries fail with OOM while performing Map Joins, which could occur when we do not have correct stats for the tables, we could try disabling hive.auto.convert.join for the next attempt:
set reexec.overlay.hive.auto.convert.join=false;
set hive.query.reexecution.strategies=overlay;
Throughout the execution of a query, the system actively monitors the real count of rows passing through each operator. This recorded information is leveraged in subsequent re-planning stages, potentially leading to the generation of a more optimized query plan.
Instances where this becomes essential include:
- Absence of statistics.
- Inaccurate statistics.
- Scenarios involving numerous joins.
In order to enable this, use:
set hive.query.reexecution.strategies=overlay,reoptimize
set hive.query.reexecution.stats.persist.scope=query
hive.query.reexecution.stats.persist.scope provides an option to persists the runtime stats at different levels:
Avoid setting it to "metastore" due to the bug discussed in HIVE-26978
When CBO fails during compilation phase, it falls back to legacy optimizer, but in many cases the it is unable to correctly recreate the AST. HIVE-25792 helps recompile the query without CBO in case it fails.
Re-executes query if it failed due to tez am node gets decommissioned.
Configuration | default | |
hive.query.reexecution.always.collect.operator.stats | false | Enable to gather runtime statistics on all queries. |
hive.query.reexecution.enabled | true | Feature enabler |
hive.query.reexecution.max.count | 1 | number of reexecution that may happen |
hive.query.reexecution.stats.cache.batch.size | -1 | If runtime stats are stored in metastore; the maximal batch size per round during load. |
hive.query.reexecution.stats.cache.size | 100 000 | Size of the runtime statistics cache. Unit is: OperatorStat entry; a query plan consist ~100. |
hive.query.reexecution.stats.persist.scope | query | runtime statistics can be persisted:
|
hive.query.reexecution.strategies | overlay,reoptimize,recompile_without_cbo,reexecute_lost_am | reexecution plugins; currently overlay and reoptimize is supported |
runtime.stats.clean.frequency | 3600s | Frequency at which timer task runs to remove outdated runtime stat entries. |
runtime.stats.max.age | 3days | Stat entries which are older than this are removed. |