I ran an impala query and noticed that the planning phase took a long time but I couldn't figure out where the time is spent. Consider the following Planner Timeline and Query Timeline from the query profile. From the code I can see that the Planner Timeline happens between the "Query submitted" and "Planning finished" milestones in the Query Timeline. However, the Planner Timeline is about 1.4s, whereas the duration between "Query submitted" and "Planning finished" is about 56s. Does anyone have any idea what might be the cause for this discrepency?
In my investigation I came across this thread: https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-Performance-Issue-Diagnosis-Hel..., which mentioned "a very long 'planning time' often indicates that the query is bottlenecked on loading/refreshing the table metadata." Are there any KPIs in the query profile or else where that would indicate that loading/refreshing table metadata is the bottleneck? And if so, are there best practices (e.g., tuning some config parameters) to improve this issue?
Planner Timeline Analysis finished: 96,331,949 Equivalence classes computed: 917936038 Single node plan created: 1,251,725,380 Runtime filters computed: 1,255,365,766 Distributed plan created: 1,266,640,723 Lineage info computed: 1272454499 Planning finished: 1,406,644,473
Query Timeline Query submitted: 56,292 Planning finished: 56,509,844,912 Submit for admission: 56,522,498,928 Completed admission: 56,522,824,260 Ready to start 109 fragment instances: 56,530,852,572 All 109 fragment instances started: 57,739,824,848 Rows available: 101,318,335,128 First row fetched: 101,441,023,332 Unregister query: 165,161,832,544