Community Articles

Find and share helpful community-sourced technical articles.
avatar
Master Collaborator

Hive 3.0 introduced an option to re-attempt a failed Hive query, in case the first run fails. It would only make sense if we fixed whatever was the issue in the previous run. We'll discuss the ways to configure this once without having to intervene after each failure event.

The following Hive property enables query re-execution. This should be enabled out of the box.

hive.query.reexecution.enabled=true;
hive.query.reexecution.strategies=overlay,reoptimize,recompile_without_cbo,reexecute_lost_am;
Re-execution strategies:
  • Overlay

Using this method, we can set a Hive property that should be applied on the re-execution. It works by adding a configuration subtree as an overlay to the actual hive settings(reexec.overlay.*).

set reexec.overlay.{hive_property}=new_value

Every hive setting which has a prefix of "reexec.overlay" will be set for all re-executions.

e.g.

In case our Hive queries fail with OOM while performing Map Joins, which could occur when we do not have correct stats for the tables, we could try disabling hive.auto.convert.join for the next attempt:

 

set reexec.overlay.hive.auto.convert.join=false;
set hive.query.reexecution.strategies=overlay;

 

  • Reoptimize

Throughout the execution of a query, the system actively monitors the real count of rows passing through each operator. This recorded information is leveraged in subsequent re-planning stages, potentially leading to the generation of a more optimized query plan.

Instances where this becomes essential include:

- Absence of statistics.

- Inaccurate statistics.

- Scenarios involving numerous joins.

In order to enable this, use:

 

set hive.query.reexecution.strategies=overlay,reoptimize
set hive.query.reexecution.stats.persist.scope=query

 

hive.query.reexecution.stats.persist.scope provides an option to persists the runtime stats at different levels:

  1. query - only used during the reexecution
  2. hiveserver2 - persisted in the HS2 until restarted
  3. metastore - persisted in the metastore; and loaded on hive server startup.

Avoid setting it to "metastore" due to the bug discussed in HIVE-26978

  • recompile_without_cbo

When CBO fails during compilation phase, it falls back to legacy optimizer, but in many cases the it is unable to correctly recreate the AST. HIVE-25792 helps recompile the query without CBO in case it fails.

  • reexecute_lost_am

Re-executes query if it failed due to tez am node gets decommissioned.

Some relevant configurations :

 

Configuration

default

 

hive.query.reexecution.always.collect.operator.stats

false

Enable to gather runtime statistics on all queries.

hive.query.reexecution.enabled

true

Feature enabler

hive.query.reexecution.max.count

1

number of reexecution that may happen

hive.query.reexecution.stats.cache.batch.size

-1

If runtime stats are stored in metastore; the maximal batch size per round during load.

hive.query.reexecution.stats.cache.size

100 000

Size of the runtime statistics cache. Unit is: OperatorStat entry; a query plan consist ~100.

hive.query.reexecution.stats.persist.scope

query

runtime statistics can be persisted:

  • query: - only used during the reexecution
  • hiveserver: persisted during the lifetime of the hiveserver
  • metastore: persisted in the metastore; and loaded on hiveserver startu[

hive.query.reexecution.strategies

overlay,reoptimize,recompile_without_cbo,reexecute_lost_am

reexecution plugins; currently overlay and reoptimize is supported

runtime.stats.clean.frequency

3600s

Frequency at which timer task runs to remove outdated runtime stat entries.

runtime.stats.max.age

3days

Stat entries which are older than this are removed.

 

1,683 Views