Created 10-15-2015 07:49 PM
Seeing PSYoungGen GC on YARN/RM nodes aggressively every 8-10 secs:
6171.971: [GC [PSYoungGen: 136875K->512K(144896K)] 252995K->116647K(1542656K), 0.0070630 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 6180.179: [GC [PSYoungGen: 115578K->480K(142336K)] 231714K->116703K(1540096K), 0.0065030 secs] [Times: user=0.04 sys=0.01, real=0.00 secs] 6190.316: [GC [PSYoungGen: 141974K->736K(142336K)] 258198K->116975K(1540096K), 0.0059270 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] 6200.145: [GC [PSYoungGen: 137784K->672K(140800K)] 254024K->116935K(1538560K), 0.0059910 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 6209.983: [GC [PSYoungGen: 138483K->704K(139776K)] 254746K->117079K(1537536K), 0.0060980 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 6220.037: [GC [PSYoungGen: 137973K->544K(138752K)] 254349K->117157K(1536512K), 0.0036380 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 6228.739: [GC [PSYoungGen: 117752K->768K(137728K)] 234366K->117437K(1535488K), 0.0061930 secs] [Times: user=0.03 sys=0.02, real=0.00 secs] 6237.006: [GC [PSYoungGen: 113053K->768K(136704K)] 229723K->117493K(1534464K), 0.0037210 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 6244.956: [GC [PSYoungGen: 107217K->672K(135680K)] 223942K->117589K(1533440K), 0.0080520 secs] [Times: user=0.03 sys=0.03, real=0.01 secs] 6255.009: [GC [PSYoungGen: 133601K->512K(134656K)] 250518K->117517K(1532416K), 0.0070160 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 6265.007: [GC [PSYoungGen: 131612K->672K(134144K)] 248618K->117821K(1531904K), 0.0039770 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 6274.880: [GC [PSYoungGen: 129679K->815K(132608K)] 246828K->118132K(1530368K), 0.0036740 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 6284.782: [GC [PSYoungGen: 129427K->608K(132096K)] 246744K->118164K(1529856K), 0.0039850 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 6294.809: [GC [PSYoungGen: 129803K->576K(130560K)] 247360K->118260K(1528320K), 0.0037790 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 6297.736: [GC [PSYoungGen: 39607K->544K(129536K)] 157291K->118316K(1527296K), 0.0035240 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
Is this normal?
Is there a way to properly tweak it?
Looks like the new size is set to 144M, should this be increased?
Thank you !
Created 10-19-2015 12:12 PM
Whats the output of free -g during the job execution? Do you see warnings during the idle time?
Were you able to see any error messages?
Created 10-15-2015 08:10 PM
Are you running any long running or I/O or memory intensive job?
Please post output of free -g
Created 10-15-2015 08:14 PM
Yes but it varies heavily depending the day/time. At times, it could be intensive. I am not sure it is heavy at this moment but let me get that
Created 10-15-2015 09:48 PM
[root@hadm0301 ~]# free -c 5
total used free shared buffers cached
Mem: 148308580 121816216 26492364 0 24726180 75655352
-/+ buffers/cache: 21434684 126873896
Swap: 2097144 0 2097144
Created 10-19-2015 12:12 PM
Whats the output of free -g during the job execution? Do you see warnings during the idle time?
Were you able to see any error messages?
Created 10-19-2015 06:06 PM
Thanks for the reply all. This was a non issue.
Ambari is reading JMX and JMX is incorrectly aggregating GC Times.
Similar to: https://issues.apache.org/jira/browse/AMBARI-3178
Thanks for the help
Created 10-19-2015 06:24 PM
IMO, this doesn't look bad at all. While you could tune the young generation size a bit higher to lessen these, the amount of time spent in GC is pretty low, so it's unlikely to have any impact on long term performance. We'd also need to see entries for the Perm Gen and Old Gen to determine what impact increasing the young gen would have.
Let's break it down:
This is a Young Generation collection, also known as a minor collection. The total heap used by the young generation hovers around 135mb, which aligns with your setting. The size of the young gen before GC is hovering around 130mb (some times less, as heap needs for objects will determine when the GC is needed). After GC, the heap is 1mb, meaning clean up went very well and most objects are short lived for this application.
Ultimately, these collections took on average 5ms each (.005 seconds) each 8 seconds or so (8000 ms), under .0001% of the total run time of the application, which is perfectly fine.