It's about Cloudera Hadoop 5.4.4.
We've noticed that it takes some time since the moment when Hive inserts data into Hadoop - until Impala can present that data.
Is that time period configurable ? Is it possible to have an impact on that time period ?
Many thanks and looking forward your assistance,
If Impala has already loaded the table, the cached copy won't be automatically updated. If you added new data to the table from outside of Impala, you need to use REFRESH: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_refresh.html#refresh. If you changed other metadata, you may need to use INVALIDATE METADATA <table name>: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_invalidate_metadata.html
Many thanks Tim.
Is there any possibility to set/define an automatic Refresh in Impala ?
And also, we've noticed that after some time - the information in Impala is synchronized with the made-before Hive inserts, and that seems to be done without explicit Refresh.
How can that be explained ?