Support Questions
Find answers, ask questions, and share your expertise

Hive ANALYZE query changes the LOCATION of table partitions

Hive ANALYZE query changes the LOCATION of table partitions

New Contributor

We have encountered a behavior in Hive where ANALYZE queries change the locations of table partitions to those of another table.

ANALYZE log:

INFO  [HiveServer2-Background-Pool: <thread_id>]: ql.Driver (Driver.java:execute) - Executing command(queryId=<query_id>): ANALYZE TABLE <table_A_name> PARTITION(<partition_info>) COMPUTE STATISTICS
....
INFO  [HiveServer2-Background-Pool: <thread_id>]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo) - 94048: alter_partitions : db=<db_name> tbl=<table_A_name>
INFO  [HiveServer2-Background-Pool: <thread_id>]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent) - ugi=<ugi>      ip=unknown-ip-addr      cmd=alter_partitions : db=<db_name> tbl=<table_A_name>
INFO  [HiveServer2-Background-Pool: <thread_id>]: metastore.HiveMetaStore (HiveMetaStore.java:alter_partitions) - New partition values:[<partition_value>]
INFO  [HiveServer2-Background-Pool: <thread_id>]: metastore.HiveMetaStore (HiveMetaStore.java:alter_partitions) - New partition values:[<partition_value>]
INFO  [HiveServer2-Background-Pool: <thread_id>]: metastore.HiveMetaStore (HiveMetaStore.java:alter_partitions) - New partition values:[<partition_value>]

Results:

hive> select distinct(INPUT__FILE__NAME) from <table_A_name> where <partition_column> BETWEEN <value_A> AND <value_B>;
hdfs://.../<db_name>/<table_A_name>/<partition>=<value>/xxxxxx_x
...
hdfs://.../<db_name>/<table_A_name>/<partition>=<value>/xxxxxx_x
...
hdfs://.../<db_name>/<table_B_name>/<partition>=<value>/xxxxxx_x
...
hdfs://.../<db_name>/<table_B_name>/<partition>=<value>/xxxxxx_x

Is this a known issue with Hive? We would appreciate workarounds if any are available.

Hive Version: 2.0.0

4 REPLIES 4

Re: Hive ANALYZE query changes the LOCATION of table partitions

Cloudera Employee

I'm not sure what causes the problem; I've tried to reproduce you issue without success:

drop table if exists t;
create table t (a int) partitioned by (p string);
insert into t partition(p=1) values(1);
-- runs ok
analyze table t partition(p=1) compute statistics;
-- throws exception for me right now
analyze table t partition(p=2) compute statistics;

Could you please help extend this minimal sample case into a testcase which reproduces your issue? Could you give some more precise information about the hive you are using (full version number)... - if you are using Apache Hive 2.0.0; I would like recommend to try out either 2.3.3 or 3.0.0

Re: Hive ANALYZE query changes the LOCATION of table partitions

New Contributor

@zhaindrich

Thank you for your response.

Could you please help extend this minimal sample caseinto a testcase which reproduces your issue?

We are currently investigating the problem, and so far:

  1. We observed that the locations of some partitions of a table are pointing to another table
  2. The behavior is not 100% replicable, so we cannot provide concrete replication steps
  3. We have not found the root cause yet, but based on the log, we saw that alter table occurred after an ANALYZE query

A similar observation can be found here

if you are using Apache Hive 2.0.0; I would like recommend to try out either 2.3.3 or 3.0.0

A version upgrade is not an immediate option because it will take time to implement.

We would like to know whether this is a current issue or has a JIRA ticket?

Re: Hive ANALYZE query changes the LOCATION of table partitions

Cloudera Employee

the "alter table" which is happening should be okay; since that call stores some results of the analyze
I don't know about any tickets which records that it could change the location by mistake..

But according to the article you've found; this seems to have happened to other people as well...

All I can think about right now is that something is not fully thread safe.
Is there by any chance multiple analyze statements running at the same time?
Are you using standalone hive-cli or beeline/jdbc with hiveserver2 ?

Re: Hive ANALYZE query changes the LOCATION of table partitions

New Contributor

@zaindrich

Apologies for our delayed response.

Are you using standalone hive-cli or beeline/jdbc with hiveserver2

hiveserver2 was used for these queries.

Is there by any chance multiple analyze statements running at the same time

We investigated the logs and found other threads running analyze statements.

One thread was of interest. This was running at the same time as the thread in my original post:

INFO  [HiveServer2-Background-Pool: <Thread-2>]: ql.Driver (Driver.java:execute(1515)) - Executing command(queryId=<query_id>): ANALYZE TABLE <table_B_name> PARTITION(<partition_info>) COMPUTE STATISTICS
...
INFO  [HiveServer2-Background-Pool: <Thread-2>]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(669)) - 94045: get_table : db=<db_name> tbl=<table_B_name>
...
INFO  [HiveServer2-Background-Pool: <Thread-2>]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=<ugi>  ip=unknown-ip-addr  cmd=get_table : db=<table_B_name>
...
INFO  [HiveServer2-Background-Pool: <Thread-2>]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(669)) - 94045: alter_partitions : db=<db_name> tbl=<table_A_name>
INFO  [HiveServer2-Background-Pool: <Thread-2>]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=<ugi>  ip=unknown-ip-addr  cmd=alter_partitions : db=<db_name> tbl=<table_A_name>
INFO  [HiveServer2-Background-Pool: <Thread-2>]: metastore.HiveMetaStore (HiveMetaStore.java:alter_partitions(3246)) - New partition values:[<partition_value>]
INFO  [HiveServer2-Background-Pool: <Thread-2>]: metastore.HiveMetaStore (HiveMetaStore.java:alter_partitions(3246)) - New partition values:[<partition_value>]
INFO  [HiveServer2-Background-Pool: <Thread-2>]: metastore.HiveMetaStore (HiveMetaStore.java:alter_partitions(3246)) - New partition values:[<partition_value>]

In Thread-2, ANALYZE was run on Table B, but alter_partitions was done on Table A.

We had to redact some information but:

  • Tables A and B in Thread-2 are the same as Tables A and B in the original thread.