Support Questions

cjervis · ‎11-11-2019

Hi,

I am running a Spark Job as below:

/spark-submit --master yarn --deploy-mode client --name sparktest --num-executors 500 --executor-memory 30g --driver-memory 16g --queue apps_aup --conf spark.yarn.executor.memoryOverhead=2000 --conf spark.driver.maxResultSize=12g --conf spark.executor.cores=2 --conf spark.task.cpus=2 --conf spark.network.timeout=600 --conf spark.blacklist.enabled=true --conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

The Job runs successfully, but everytime it runs there are some of the tasks which run into long GC Pauses.

I could see following when troubleshooting GC:

A.)

Observation for GC - Executor ID 2

1. GC options Used:

-verbose:gc -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

2. Executor Details :

spark.executor.memory = 30GB
Storage Memory of Executor = 17GB
spark.executor.instances = 500

Executor ID

2

3. Task Details :

Task Time = 19 min , GC Time = 16 min

Input Data Size = 217.9 MB

Shuffle Write Size = 651.8 MB

The above shows that the Total GC time on Executor ID 2, was very high. it was around 16 minutes.

B.) GC Analysis for Sample Executor ID 2

Full GC time:

The Full GC Count happened 7 times.
The total time for these 7  Full GC was 10 min 38 sec 190 ms.
Hence, the  average Full GC time was 1 min 31 sec 170 ms.
The Average Interval between Full GC Interval was 1 min 59 sec 524 ms.

Full GC Size :

The reclaimed data size  from the full GC was 1.56 gb

Minor GC :

The Total Minor GC count was 20

The total time for Minor GC was 5 min 32 sec 380 ms

The average time for Minor GC was 16 sec 619 ms 

Each of these Minor GC happened at an interval for approximately 1 min 631 ms 

During these 20 Minor GC the Total size of Data reclaimed was 33.65 GB

C.)

The Total objects created by application added to be around 45.15 GB.
The Object creation rate by the application was around 40 MB/s.
The Total Objects which got promoted from youngGen to oldGen in each GC cycle was around 8 GB. These objects were promoted on an average rate of 7 MB/s

There was around 18 Allocation Failures, which indicated that the object creation rate was higher than the free heap space in the younGen. This lead to 18 Minor GC cycles (out of 20 total minor GC) to free up younGen and promote objects from younGen to OldGen.

D.) GC Time :

GC Time - real, user, sys

The longest Full GC was around 5 minutes :

722.599: [Full GC (Ergonomics) [PSYoungGen: 1304054K->0K(7388672K)] [ParOldGen: 9194816K->10418042K(14044160K)] 10498870K->10418042K(21432832K), [Metaspace: 56010K->56010K(1099776K)], 303.5027565 secs]

Here I see -

[Times: user=711.36 sys=35.16, real=303.50 secs]

The time spend for GC for system was 35.16 second.
The user time was 711.36 seconds. This is the time added by each GC thread.
The User + Sys = 746.52 seconds.

The Real Time was only 303.50 seconds (5.05 Minutes). This was the actual time which was spend in GC. This looks good from the Cluster Perspective since the system time was only 35.16. The system looks good (CPU cycles free) to perform the GC.

The GC log analysis is from the GCeasy link :

https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTkvMTEvMTEvLS1nY19zcGFya18zLnR4dC0tMTktNTItNQ==&ch...

The GC logs shows:

426.481: [Full GC (Ergonomics) [PSYoungGen: 1520503K->200302K(5950464K)] [ParOldGen: 6657880K->7368200K(10468864K)] 8178383K->7568503K(16419328K), [Metaspace: 55722K->55722K(1097728K)], 142.3782071 secs] [Times: user=329.05 sys=12.63, real=142.38 secs]
572.138: [GC (Allocation Failure) [PSYoungGen: 3949678K->1623296K(6054400K)] 11317879K->9192449K(16523264K), 40.8130588 secs] [Times: user=99.07 sys=2.21, real=40.81 secs]
633.942: [GC (Allocation Failure) [PSYoungGen: 5372672K->1304054K(7388672K)] 12941825K->10498870K(17857536K), 88.6569824 secs] [Times: user=212.32 sys=6.40, real=88.66 secs]
722.599: [Full GC (Ergonomics) [PSYoungGen: 1304054K->0K(7388672K)] [ParOldGen: 9194816K->10418042K(14044160K)] 10498870K->10418042K(21432832K), [Metaspace: 56010K->56010K(1099776K)], 303.5027565 secs] [Times: user=711.36 sys=35.16, real=303.50 secs]
1060.380: [GC (Allocation Failure) [PSYoungGen: 4968960K->208K(7355392K)] 15387002K->10418258K(21399552K), 0.1422739 secs] [Times: user=0.23 sys=0.50, real=0.14 secs]
1090.725: [GC (Allocation Failure) [PSYoungGen: 4969168K->352K(8035840K)] 15387218K->10418410K(22080000K), 0.0677690 secs] [Times: user=0.19 sys=0.34, real=0.07 secs]
1124.474: [GC (Allocation Failure) [PSYoungGen: 5606752K->256K(5606912K)] 16024810K->10418322K(19651072K), 0.0516994 secs] [Times: user=0.20 sys=0.17, real=0.05 secs]
1157.440: [GC (Allocation Failure) [PSYoungGen: 5606656K->368K(8077312K)] 16024722K->10418434K(22121472K), 0.0649818 secs] [Times: user=0.18 sys=0.33, real=0.07 secs]

The Spark UI shows that the Input Data Size is 217 MB: Screen Shot 2019-11-11 at 10.10.39 PM.png

Why is the GC Pause so high for a Input data of only 217 MB?

What does the Input refer here?

PARTOMIA09 · ‎11-12-2019

I am adding few more details for further insight into this issue :

w@leed · ‎11-12-2019

Hi @PARTOMIA09

One suggestion off the bat is to possibly consider moving to a G1GC policy instead given that you have relatively large heap sizes (30 GB for executors and 16 GB for the driver). Typically the G1GC policy was developed to be better performant for larger heaps (> 8 GB). Try the following and see if that helps:

--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35"

--conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35"

"The Garbage-First Garbage Collector":
https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html

PARTOMIA09 · ‎11-12-2019

Hi w@leed

Thanks for Replying.

I did test the Job with all the three Collectors - ParallelGC, CMS and G1GC:

I has tested following options with the G1GC:

-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

and with CMS:

-XX:+UseConcMarkSweepGC -XX:+PrintGCTimeStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC -XX:+CMSConcurrentMTEnabled -XX:ParallelCMSThreads=10 -XX:ConcGCThreads=8 -XX:ParallelGCThreads=16

With G1GC defaults, I could see following:

Desired survivor size 1041235968 bytes, new threshold 5 (max 15)
[PSYoungGen: 1515304K->782022K(3053056K)] 2750361K->2017087K(6371840K), 1.5875321 secs] [Times: user=4.72 sys=0.74, real=1.59 secs]
Heap after GC invocations=9 (full 3):
PSYoungGen total 3053056K, used 782022K [0x0000000580000000, 0x000000068ef80000, 0x0000000800000000)
eden space 2270720K, 0% used [0x0000000580000000,0x0000000580000000,0x000000060a980000)
from space 782336K, 99% used [0x000000065f380000,0x000000068ef31ab0,0x000000068ef80000)
to space 1016832K, 0% used [0x0000000612d80000,0x0000000612d80000,0x0000000650e80000)
ParOldGen total 3318784K, used 1235064K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000)
object space 3318784K, 37% used [0x0000000080000000,0x00000000cb61e318,0x000000014a900000)
Metaspace used 55055K, capacity 55638K, committed 55896K, reserved 1097728K
class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K
}
{Heap before GC invocations=10 (full 3):
PSYoungGen total 3053056K, used 3052742K [0x0000000580000000, 0x000000068ef80000, 0x0000000800000000)
eden space 2270720K, 100% used [0x0000000580000000,0x000000060a980000,0x000000060a980000)
from space 782336K, 99% used [0x000000065f380000,0x000000068ef31ab0,0x000000068ef80000)
to space 1016832K, 0% used [0x0000000612d80000,0x0000000612d80000,0x0000000650e80000)
ParOldGen total 3318784K, used 1235064K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000)
object space 3318784K, 37% used [0x0000000080000000,0x00000000cb61e318,0x000000014a900000)
Metaspace used 55108K, capacity 55702K, committed 55896K, reserved 1097728K
class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K
42.412: [GC (Allocation Failure)
Desired survivor size 1653080064 bytes, new threshold 4 (max 15)
[PSYoungGen: 3052742K->1016800K(3422720K)] 4287807K->2985385K(6741504K), 4.0304873 secs] [Times: user=11.87 sys=1.77, real=4.03 secs]
Heap after GC invocations=10 (full 3):
PSYoungGen total 3422720K, used 1016800K [0x0000000580000000, 0x0000000727a80000, 0x0000000800000000)
eden space 2405888K, 0% used [0x0000000580000000,0x0000000580000000,0x0000000612d80000)
from space 1016832K, 99% used [0x0000000612d80000,0x0000000650e78240,0x0000000650e80000)
to space 1614336K, 0% used [0x00000006c5200000,0x00000006c5200000,0x0000000727a80000)
ParOldGen total 3318784K, used 1968584K [0x0000000080000000, 0x000000014a900000, 0x0000000580000000)
object space 3318784K, 59% used [0x0000000080000000,0x00000000f8272318,0x000000014a900000)
Metaspace used 55108K, capacity 55702K, committed 55896K, reserved 1097728K
class space used 7049K, capacity 7207K, committed 7256K, reserved 1048576K

With all the Collectors only difference I could see was that, a delayed full GC.

I am considering to changing the YoungGen now. Will update if I do see a difference.

On a parallel note -

1. I did also see that there are some of the objects in the memory which remain persistent across GC cycles - for example : scala.Tuple2 and java.lang.Long

2. These are Java RDD's

Regards

Cloudera Community

Support Questions

Spark Job long GC pauses