Created on 12-27-202309:48 AM - edited 12-27-202309:51 AM
Hive 3.0 introduced an option to re-attempt a failed Hive query, in case the first run fails. It would only make sense if we fixed whatever was the issue in the previous run. We'll discuss the ways to configure this once without having to intervene after each failure event.
The following Hive property enables query re-execution. This should be enabled out of the box.
Using this method, we can set a Hive property that should be applied on the re-execution. It works by adding a configuration subtree as an overlay to the actual hive settings(reexec.overlay.*).
set reexec.overlay.{hive_property}=new_value
Every hive setting which has a prefix of "reexec.overlay" will be set for all re-executions.
e.g.
In case our Hive queries fail with OOM while performing Map Joins, which could occur when we do not have correct stats for the tables, we could try disabling hive.auto.convert.join for the next attempt:
set reexec.overlay.hive.auto.convert.join=false;
set hive.query.reexecution.strategies=overlay;
Reoptimize
Throughout the execution of a query, the system actively monitors the real count of rows passing through each operator. This recorded information is leveraged in subsequent re-planning stages, potentially leading to the generation of a more optimized query plan.
Instances where this becomes essential include:
- Absence of statistics.
- Inaccurate statistics.
- Scenarios involving numerous joins.
In order to enable this, use:
set hive.query.reexecution.strategies=overlay,reoptimize
set hive.query.reexecution.stats.persist.scope=query
hive.query.reexecution.stats.persist.scope provides an option to persists the runtime stats at different levels:
query - only used during the reexecution
hiveserver2 - persisted in the HS2 until restarted
metastore - persisted in the metastore; and loaded on hive server startup.
Avoid setting it to "metastore" due to the bug discussed in HIVE-26978
recompile_without_cbo
When CBO fails during compilation phase, it falls back to legacy optimizer, but in many cases the it is unable to correctly recreate the AST. HIVE-25792 helps recompile the query without CBO in case it fails.
reexecute_lost_am
Re-executes query if it failed due to tez am node gets decommissioned.