According to this documentation when running a query from a DataNode via impala-shell, the Impala daemon running on that node acts as the coordinator node for that query, but in theory all nodes with Impala daemons will work in parallel to transmit partial results back.
It seems though that in our cluster this is not working properly because it only uses 2% CPU and it takes a lot of time to complete queries.
Also, since CDH 5.10 the use of Llama role is deprecated, so what is the right way to manage Impala resources? Chaning CPU shares in the configuration seems to have no effect.
Go to Cloudera manager -> Host (select all hosts one by one) -> Resource (menu) and check CPU, Memory allocation for each service. You can customize it but please be mindful that there won't be any overlap for resource allocation
We can see that Approximate CPU for Impala is set to 1.0 in the cluster nodes, while this setting for other services like YARN is set to 14.0, but we don't find a way to edit this value for the Impala Daemon role.