Created 02-07-2018 01:33 AM
It's about Cloudera Hadoop 5.4.4.
We've noticed that it takes some time since the moment when Hive inserts data into Hadoop - until Impala can present that data.
Is that time period configurable ? Is it possible to have an impact on that time period ?
Many thanks and looking forward your assistance,
Avi Vainshtein
Created 02-07-2018 01:01 PM
If Impala has already loaded the table, the cached copy won't be automatically updated. If you added new data to the table from outside of Impala, you need to use REFRESH: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_refresh.html#refresh. If you changed other metadata, you may need to use INVALIDATE METADATA <table name>: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_invalidate_metadata.html
Created 02-08-2018 12:57 AM
Many thanks Tim.
Is there any possibility to set/define an automatic Refresh in Impala ?
And also, we've noticed that after some time - the information in Impala is synchronized with the made-before Hive inserts, and that seems to be done without explicit Refresh.
How can that be explained ?