Created on 04-08-2016 06:12 AM - last edited on 02-17-2020 03:22 PM by ask_bill_brooks
I am running a CDH distribution (version 5.6.0) with Impala (version 2.4.0).
I have some Parquet files stored in HDFS. Next, I have loaded these files into an Impala external table. Upon executing the following query all the files are successfully listed:
[cloudera-impala-dn0.eastus.cloudapp.azure.com:21000] > show files in parquettable;
Also, the metadata is correct (checked by executing describe parquettable).
The stats of the table are:
[cloudera-impala-dn0.eastus.cloudapp.azure.com:21000] > show table stats parquettable;
Rows | Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location
-1 | 838 | 249.64GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://cloudera-impala-mn0.eastus.cloudapp.azure.com:8020/user/root/big_data
Executing the following query:
[cloudera-impala-dn0.eastus.cloudapp.azure.com:21000] > select count(*) from parquettable;
results in the following WARNING, but without any output result or error:
File 'hdfs://cloudera-impala-mn0.eastus.cloudapp.azure.com:8020/user/root/big_data/part-r-00001-7c29b85c-bd1f-420e-8834-96300076a92d.gz.parquet' has an invalid version number: ▒.F/ This could be due to stale metadata. Try running "refresh default.parquettable".
Running refresh default.parquettable did not have any effect.