When a refresh command is fired in Impala over a table, the table gets refreshed with following logs in the catalogd :
Refreshing table metadata: XXXXXX
I0517 09:11:17.238642 13238 HdfsTable.java:1194] Incrementally loading table metadata for: XXXXXX I0517 09:11:17.251282 13238 HdfsTable.java:835] Loading file and block metadata for 1050 paths for table XXXXXX using a thread pool of size 5 I0517 09:11:17.345129 13238 HdfsTable.java:875] Loaded file and block metadata for XXXXXX I0517 09:11:17.345355 13238 HdfsTable.java:1204] Incrementally loaded table metadata for: XXXXXX I0517 09:11:17.345435 13238 CatalogServiceCatalog.java:1019] Refreshed table metadata: XXXXXX . . other metadata activities on other tables . . . I0517 09:15:53.448909 106386 catalog-server.cc:324]Publishing update: TABLE:XXXXX@104793
Notice that there is a significant delay before the refresh on the table gets published to daemons. I am seeing the following error on queries which try to access that in the mean time before the publish update is fired.
File 'hdfs://XXXXXXXXXXXXX.parquet' has an invalid version number: This could be due to stale metadata. Try running "refresh XXXXX".
Is there a reason behind this or am i understanding it wrong?
Also i notice that all the publish statements are more often than not grouped together as follows :