My cluster configuration is as follows:
All 3 nodes have Kudu master and T-Server and Impala server, one of the node has Impala catalogue and Impala StateStore.
My issues are as follows:
1) I've a hard time figuring out Dynamic resource pooling in impala while running concurrent queries. I've tried giving mem_limit still no luck. I've also tried static service pool but with that also I couldn't achieve required concurrency. Even with admission control, the required concurrency was not achieved.
I) The time taken for 1 query: 500-800ms.
II) But if 10 concurrent queries are given the time taken grows to 3-6s per query.
III) But if more than 20 concurrent queries are given the time taken is exceeding 10s per query.
2) One of my cluster nodes is not taking the load after submitting the query, I checked this by the summary of the query. I've tried giving the NUM_NODES as 0 and 1 on the node which is not taking the load, still, the summary shows that the node is not taking the load.
Q) Are you load balancing the queries across Impala Daemons.
Ans) How do I load balance the queries across impala demons?
Q) If just two ID are working on the query it means you are running queries on small data (i.e. blocks are just on two nodes). What kind of queries are you running (are there just scans, or broadcasts?)
Ans) The queries contains multi-joins to various tables. I've tried giving a bigger query which takes around 10-15sec but still the query is not going to that specific node, Is there any way to check why it is not distributing the load to that specific node?
According to Cloudera documentation: Only accepts the values 0 (meaning all nodes) or 1 (meaning all work is done on the coordinator node). Check the documentation here NUM_NODES.
Even after setting the NUM_NODES to 1 for that specific node, the query still it goes to any one of the other nodes.