Issue: - Facing error while compiling some of the codes in Pyspark
GC overhead limit exceeded
java.lang.OutOfMemoryError: Java heap space
Please find below the information which i have collected from the current yarn-site.xml & mapred-site.xml
yarn.scheduler.maximum-allocation-mb - 65536
yarn.scheduler.minimum-allocation-mb - 4096
mapreduce.map.java.opts = Xmx2048m mapreduce.map.memory.mb = 2560 mapreduce.reduce.java.opts = Xmx4096m mapreduce.reduce.memory.mb = 5120
Are you getting out of memory in executor ? If yes, then you can increase the properties like spark.executor.memory, or set argument --executor-memory if using pyspark shell. But one thing I would like to point out here that whenever you are doing map-side aggregate or caching or shuffling in memory, these are taking more memory away from what is need for computations, so tuning your job according to your computation will fix this issue. Below link you can get more spark tuning details.