Created on 05-04-2018 03:50 AM - edited 09-16-2022 06:10 AM
Hi!
We have some problem with YARN performance that manifest itself in very high utilization of CPU (up to 100%) and growing workloads on all nodes. There are max 16 containers running on each node with 20 CPUs available, which gives 40 available vCores (thanks to hyperthreading). However, the hadoop jobs seem to take more resources than they should - sometimes 500-700% CPU in 'top' command output per one container. YARN configuration seem to be quite reasonable: yarn.scheduler.minimum-allocation-vcores=1, yarn.scheduler.increment-allocation-vcores=1, yarn.scheduler.maximum-allocation-vcores=16 and all offending jobs request only 1 vCore when checked in their review on Resource Manager's UI. However despite that configuration one can see more than 20 threads open for every container, whereas I'd expect no more than 2. Does anyone know why a single container uses so much CPU time? Is it possible to control number of threads per container with some MapRed configuration or java settings? Or maybe this problem comes from flaws in the MapRed program itself? Would cgroups be a good aproach to solve this issue? I'm using CDH 5.9.3 with computation running on Centos 7.2. Thanks for all answers!
Created 05-06-2018 11:18 PM
Created on 05-08-2018 04:18 AM - edited 05-08-2018 04:41 AM
Hi @Harsh J, thanks a million for such thorough and elaborate answer. I haven't solved the problem yet, probably I will apply cgroups configuration as suggested. I hope it's going to work however the reason why single JVM uses so much CPU is misterious to me. I understand that yarn treats vcore as rough forecaster of how much CPU time will be used and probably we could mitigate problem by putting more vcores in job's application or reducing number of running containers on the node in other way, but still we wouldn't be guaranteed that some of the containers wouldn't use even more CPU up to total capacity of the server. It looks like having containers running many threads resulting in CPU share more than 100% per container undermines the concept how yarn dispatch tasks to the nodes. I've also come across this tutorial: https://hortonworks.com/blog/apache-hadoop-yarn-in-hdp-2-2-isolation-of-cpu-resources-in-your-hadoop... which includes: " how do we ensure that containers don’t exceed their vcore allocation? What’s stopping an errant container from spawning a bunch of threads and consume all the CPU on the node?"
It appears that in the past the rule of thumb 1+vcore/1real core worked ( I saw it several older tutorials) but now we have different patterns of the workload (less IO dependant/more CPU consuming) and this rule doesn't work very well now So effectively cgroups seem to be the only solution to ensure containers don't exceed their vcore allocation. Let me know if you agree or see other solutions. Thanks a million!