Community Articles

smruti · ‎12-27-2023

Hive 3.0 introduced an option to re-attempt a failed Hive query, in case the first run fails. It would only make sense if we fixed whatever was the issue in the previous run. We'll discuss the ways to configure this once without having to intervene after each failure event.

The following Hive property enables query re-execution. This should be enabled out of the box.

hive.query.reexecution.enabled=true;
hive.query.reexecution.strategies=overlay,reoptimize,recompile_without_cbo,reexecute_lost_am;

Re-execution strategies:

Overlay

Using this method, we can set a Hive property that should be applied on the re-execution. It works by adding a configuration subtree as an overlay to the actual hive settings(reexec.overlay.*).

set reexec.overlay.{hive_property}=new_value

Every hive setting which has a prefix of "reexec.overlay" will be set for all re-executions.

e.g.

In case our Hive queries fail with OOM while performing Map Joins, which could occur when we do not have correct stats for the tables, we could try disabling hive.auto.convert.join for the next attempt:

set reexec.overlay.hive.auto.convert.join=false;
set hive.query.reexecution.strategies=overlay;

Reoptimize

Throughout the execution of a query, the system actively monitors the real count of rows passing through each operator. This recorded information is leveraged in subsequent re-planning stages, potentially leading to the generation of a more optimized query plan.

Instances where this becomes essential include:

- Absence of statistics.

- Inaccurate statistics.

- Scenarios involving numerous joins.

In order to enable this, use:

set hive.query.reexecution.strategies=overlay,reoptimize
set hive.query.reexecution.stats.persist.scope=query

hive.query.reexecution.stats.persist.scope provides an option to persists the runtime stats at different levels:

query - only used during the reexecution
hiveserver2 - persisted in the HS2 until restarted
metastore - persisted in the metastore; and loaded on hive server startup.

Avoid setting it to "metastore" due to the bug discussed in HIVE-26978

recompile_without_cbo

When CBO fails during compilation phase, it falls back to legacy optimizer, but in many cases the it is unable to correctly recreate the AST. HIVE-25792 helps recompile the query without CBO in case it fails.

reexecute_lost_am

Re-executes query if it failed due to tez am node gets decommissioned.

Some relevant configurations :

Configuration	default
hive.query.reexecution.always.collect.operator.stats	false	Enable to gather runtime statistics on all queries.
hive.query.reexecution.enabled	true	Feature enabler
hive.query.reexecution.max.count	1	number of reexecution that may happen
hive.query.reexecution.stats.cache.batch.size	-1	If runtime stats are stored in metastore; the maximal batch size per round during load.
hive.query.reexecution.stats.cache.size	100 000	Size of the runtime statistics cache. Unit is: OperatorStat entry; a query plan consist ~100.
hive.query.reexecution.stats.persist.scope	query	runtime statistics can be persisted: query: - only used during the reexecution hiveserver: persisted during the lifetime of the hiveserver metastore: persisted in the metastore; and loaded on hiveserver startu[
hive.query.reexecution.strategies	overlay,reoptimize,recompile_without_cbo,reexecute_lost_am	reexecution plugins; currently overlay and reoptimize is supported
runtime.stats.clean.frequency	3600s	Frequency at which timer task runs to remove outdated runtime stat entries.
runtime.stats.max.age	3days	Stat entries which are older than this are removed.

Cloudera Community

Community Articles

Hive Query Recovery Tactics: Handling Failures through Effective Re-Execution Strategies

Apache Hive

Cloudera Data Platform Private Cloud (CDP-Private)

Cloudera Data Warehouse (CDW)

Re-execution strategies:

Some relevant configurations :