Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

File has an invalid version number. This could be due to stale metadata

avatar
New Contributor

I am running a CDH distribution (version 5.6.0) with Impala (version 2.4.0).

I have some Parquet files stored in HDFS. Next, I have loaded these files into an Impala external table. Upon executing the following query all the files are successfully listed:

[cloudera-impala-dn0.eastus.cloudapp.azure.com:21000] > show files in parquettable;

Also, the metadata is correct (checked by executing describe parquettable).

The stats of the table are:

[cloudera-impala-dn0.eastus.cloudapp.azure.com:21000] > show table stats parquettable;

 

Rows | Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location

-1 | 838 | 249.64GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://cloudera-impala-mn0.eastus.cloudapp.azure.com:8020/user/root/big_data

Executing the following query:

[cloudera-impala-dn0.eastus.cloudapp.azure.com:21000] > select count(*) from parquettable;

results in the following WARNING, but without any output result or error:

File 'hdfs://cloudera-impala-mn0.eastus.cloudapp.azure.com:8020/user/root/big_data/part-r-00001-7c29b85c-bd1f-420e-8834-96300076a92d.gz.parquet' has an invalid version number: ▒.F/ This could be due to stale metadata. Try running "refresh default.parquettable".

Running refresh default.parquettable did not have any effect.

2 REPLIES 2

avatar
New Contributor

Hi,

 

I am getting same issue when using below versions.

CDH : 6.2.

Hive  :2.1.1-cdh6.2.1

Impala : 3.2.0-cdh6.2.1

 

Trying to run Compute stats  <db_name.table_name> after running invalidate metadata  <db_name.table_name> and refresh  <db_name.table_name> commands .

 

still getting same error.

 

ERROR: File 'hdfs://name_node/abc/xyz/000001_0' has an invalid version number: 2-11
This could be due to stale metadata. Try running "refresh <db_name.table_name>".

 

My process is running for more than 100 tables. This error is occurring for only 5-6 random tables some times.

 

 

 

avatar
Contributor

I encountered the same problem, any solutions?