Hi Friends,I just want to know the phases of IMPALA. Like MapReduce has, mapper phase, shuffling, Reducer Phase. I want to caputre the resoruce utilization by Impala at each phase.Does IMPALA have phases, so that we can see the behaviour of IMPALA while query is launched?
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_explain_plan.html is a good starting point to understand Impala query plans and where time is spent. Impala's architecture is closer to a parallel database than MapReduce and query execution is in-memory and streaming by default (i.e. intermediate results aren't materialised to disk at each stage).