Support Questions

lizard · ‎05-04-2018

Hi!

We have some problem with YARN performance that manifest itself in very high utilization of CPU (up to 100%) and growing workloads on all nodes. There are max 16 containers running on each node with 20 CPUs available, which gives 40 available vCores (thanks to hyperthreading). However, the hadoop jobs seem to take more resources than they should - sometimes 500-700% CPU in 'top' command output per one container. YARN configuration seem to be quite reasonable: yarn.scheduler.minimum-allocation-vcores=1, yarn.scheduler.increment-allocation-vcores=1, yarn.scheduler.maximum-allocation-vcores=16 and all offending jobs request only 1 vCore when checked in their review on Resource Manager's UI. However despite that configuration one can see more than 20 threads open for every container, whereas I'd expect no more than 2. Does anyone know why a single container uses so much CPU time? Is it possible to control number of threads per container with some MapRed configuration or java settings? Or maybe this problem comes from flaws in the MapRed program itself? Would cgroups be a good aproach to solve this issue? I'm using CDH 5.9.3 with computation running on Centos 7.2. Thanks for all answers!

Harsh J · ‎05-06-2018

One common misconception with YARN is that it 'preallocates' resources. It
does not. The memory requests and the CPU requests are not 'limited' in any
pre-reserved manner.

For memory checks, the NodeManagers run a simple monitor that periodically
checks if the container (child process) it spawned is exceeding the request
granted to it. If the pmem usage is higher, it is enforced with a kill sent
to the process.

The same is not done for other resources yet - only memory.

You're correct that CGroups can help in enforcing CPUs/etc. See
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_service_pools.html

> However despite that configuration one can see more than 20 threads open
for every container, whereas I'd expect no more than 2

The thread count depends on what the code runs. For ex. HDFS client work
typically runs up 3-4 threads by itself, because it is architected that
way. However, this does not indicate load by itself - that is driven by how
the code uses the HDFS client. Thread count alone is not a good indicator,
but active thread count over time may be a better metric.

> Does anyone know why a single container uses so much CPU time?

Difficult to say. For straight-forward record-by-record work (such as map
tasks) I'd expect ~100% CPU most of the task's life, except parts that
benefit from parallelism (sort, etc.) within the framework. Outside of
this, the user code is also typically free to run however it wants in the
container JVM, and it may decide to run some work concurrently depending on
what its goals are (once the data is read, or when writing the data).

> Is it possible to control number of threads per container with some
MapRed configuration or java settings?

Some of the thread counts such as a Reducer's parallel fetcher thread pool
can be configured. Others (such as the Responder/Reader/Caller thread arch.
of a HDFS client) are not configurable since they are designed to work that
way.

> Or maybe this problem comes from flaws in the MapRed program itself?

This could be very possible if its only affecting a partial set of your
overall container workload. A jstack can help you see which threads are
working on code from your own organization vs. the framework itself.

> Would cgroups be a good approach to solve this issue?

Yes, Linux can help 'enforce' the CPU shares. Following
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_service_pools.html
may
help with that.

However, its worth negotiating with the developers of the jobs on
requesting higher vcore counts for their CPU-heavy work before you take
this step, as a visibly large slowdown may not be very favorable.

lizard · ‎05-08-2018

Hi @Harsh J, thanks a million for such thorough and elaborate answer. I haven't solved the problem yet, probably I will apply cgroups configuration as suggested. I hope it's going to work however the reason why single JVM uses so much CPU is misterious to me. I understand that yarn treats vcore as rough forecaster of how much CPU time will be used and probably we could mitigate problem by putting more vcores in job's application or reducing number of running containers on the node in other way, but still we wouldn't be guaranteed that some of the containers wouldn't use even more CPU up to total capacity of the server. It looks like having containers running many threads resulting in CPU share more than 100% per container undermines the concept how yarn dispatch tasks to the nodes. I've also come across this tutorial: https://hortonworks.com/blog/apache-hadoop-yarn-in-hdp-2-2-isolation-of-cpu-resources-in-your-hadoop... which includes: " how do we ensure that containers don’t exceed their vcore allocation? What’s stopping an errant container from spawning a bunch of threads and consume all the CPU on the node?"

It appears that in the past the rule of thumb 1+vcore/1real core worked ( I saw it several older tutorials) but now we have different patterns of the workload (less IO dependant/more CPU consuming) and this rule doesn't work very well now So effectively cgroups seem to be the only solution to ensure containers don't exceed their vcore allocation. Let me know if you agree or see other solutions. Thanks a million!

Cloudera Community

Support Questions

yarn containers are using too much CPU time