I am trying to run IMPALA queries using query options. I am trying to analyze the resultant query attributes.
What I have observed that when I run the query for the first time, it took long time than I run the query second time or onwards.
I want to know the reason behind difference in time?
Does this happen only when table statistics are not available or it happens all the time.
Is my observation right?
Impala caches all table metadata, so planning is generally faster once the table has been referenced by a previous query. You can see the "Planner Timeline" in the IMpala query profile to get a time breakdown of planning including metadata loading.
How long Impala keeps/caches this metadata?
If statistics of tables which are participating in query are not available then will it be available after first run?
What if I run the query after big interval, then also will metadata be available in cache?
What is my cluster or Impala is restarted?
Does Impala perform some activity to get statistics of all the participating tables for the first time if statistics are not available and keep it in metastore or some where in DB?