Created on 11-11-2015 11:22 AM - edited 09-16-2022 02:49 AM
CDH 5.4.5
Impala 2.2.0-cdh5.4.5
According to Using HDFS Caching with Impala (CDH 5.1 or higher only): "When files are added to a table or partition whose contents are cached, Impala automatically detects those changes and performs a REFRESH automatically once the relevant data is cached. " But in reality enabling HDFS caching does not enable automatic-detection of file changes. REFRESH has to be called specifically to see the new data.
Reproduction following the instructions on Using HDFS Caching with Impala (CDH 5.1 or higher only):
hdfs cacheadmin -addPool test_pool -owner impala -limit 1048576
alter table test_table set cached in 'test_pool';
hdfs cacheadmin -listPools -stats # impala-shell: show table stats test_table;
Question: is the statement in the documentation correct that impala can automatically detect file changes if HDFS caching is used?
I have tried so far: