Reply
Explorer
Posts: 13
Registered: ‎12-13-2013

Spark executor with excessive GC disrespects allocation and pegs box - YARN bug?

We spent almost 2 weeks diagnosing impalad and hbase RS crashes, and finally were able to figure out it was a spark job that had been changed and was failing sometimes with excessive GC issues.  

 

The failing executors/containers were using 2,000% cpu instead of 100% or less (spark job was launched with 1 core per executor).  The box cpu usage was through the roof, even though CM didn't pick it up in the per-role graph (which has lines for NodeManager cpu_system_rate and cpu_user_rate, but doesn't pick up containers i guess?):

 

Screen Shot 2017-04-13 at 4.32.48 PM cropped.png

 

Before the container died we were able to capture this which I guess means yarn killed it because of the excessive GC (or the jvm killed it? it was started with -XX:OnOutOfMemoryError=kill):

 

Screen Shot 2017-04-14 at 10.39.38 AM.png

 

Here's top showing > 2000% cpu on a java process:

 

Screen Shot 2017-04-13 at 4.24.30 PM cropped.png

 

Which we can see it was the container for this executor, and shows specifically the --cores 1 argument: 

 

Screen Shot 2017-04-14 at 10.13.35 AM.png

 

 

We're on CDH 5.10 and are using both static and dynamic pools. Cgroup I/O Weight for YARN NodeManager is 35%.  Container Virtual CPU Cores is set to 12 (out of 32 cores total).  For dynamic all pools are using DRF scheduler, and the pool this job runs in has 130 cores out of 336 total.

 

So is this total breakdown of the resource allocation model a yarn bug or a limitation that one must be aware of and watch carefully?

 

Posts: 582
Topics: 3
Kudos: 82
Solutions: 54
Registered: ‎08-16-2016

Re: Spark executor with excessive GC disrespects allocation and pegs box - YARN bug?

I couldn't find anything concrete and haven't reviewed the source code but I suspect it isn't a 1-to-1 mapping to the actual physical cores. Therefor there is nothing preventing it from using more. In Spark 2.0 there does seem to be the addition of setting a maximum, although it is across all nodes for the job. Anyway, I view this as a symptom of a constraint on the heap and excessive time spent in GC. All that work trying to do GC to clear up spaced chewed up the CPU. Address that and keep an eye out going forward is the best route.
Announcements