Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.
I am running a CDH distribution (version 5.6.0) with Impala (version 2.4.0). I have some Parquet files stored in HDFS. Next, I have loaded these files into an Impala external table. Upon executing the following query all the files are successfully listed: [cloudera-impala-dn0.eastus.cloudapp.azure.com:21000] > show files in parquettable; Also, the metadata is correct (checked by executing describe parquettable). The stats of the table are: [cloudera-impala-dn0.eastus.cloudapp.azure.com:21000] > show table stats parquettable; Rows | Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location -1 | 838 | 249.64GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://cloudera-impala-mn0.eastus.cloudapp.azure.com:8020/user/root/big_data Executing the following query: [cloudera-impala-dn0.eastus.cloudapp.azure.com:21000] > select count(*) from parquettable; results in the following WARNING, but without any output result or error: File 'hdfs://cloudera-impala-mn0.eastus.cloudapp.azure.com:8020/user/root/big_data/part-r-00001-7c29b85c-bd1f-420e-8834-96300076a92d.gz.parquet' has an invalid version number: ▒.F/ This could be due to stale metadata. Try running "refresh default.parquettable". Running refresh default.parquettable did not have any effect.
... View more