I am checking impala performance on multiple systems with different hardware configuration, while testing in on 8 core system I see only 1 core touches 100% CPU while remaining cores are idle when executing select query , I am not able to use the entire hardware so how can I make all cores work for impala ?
I had used htop to check cpu usage.
Currently my cluster is single node, 12gb RAM, 8core cpu.
I am using latest Cloudera VM.
I checked impala and hdfs configs but I found only Cgroup CPU Shares = 1024, is there some parameter which defines no. of cores to be used ?
@punshithis is very dependent on the specific queries and the workload as a whole (i.e. concurrent queries). Some operators in Impala - mainly scans - are always parallelised, so if the query is mostly scan-intensive. Joins and aggregates are not parallelised within a node, so if those are the bottleneck for queries and you are only running one query at at time, then you may only see a single core utilised. We typically see CPU saturated on production workloads with concurrent queries - usually production clusters have no issue saturating CPU.
We have some long-term plans to run all operators with a configurable degree of parallelism.
I'll note that this is pretty standard for analytical databases - most systems won't let a single query use all the system resources by default and in configurations used for production.