Created 11-22-2018 12:17 PM
Issue: - Facing error while compiling some of the codes in Pyspark
Errors: -
GC overhead limit exceeded
java.lang.OutOfMemoryError: Java heap space
Please find below the information which i have collected from the current yarn-site.xml & mapred-site.xml
yarn.scheduler.maximum-allocation-mb - 65536
yarn.scheduler.minimum-allocation-mb - 4096
mapreduce.map.java.opts = Xmx2048m mapreduce.map.memory.mb = 2560 mapreduce.reduce.java.opts = Xmx4096m mapreduce.reduce.memory.mb = 5120
Created 11-22-2018 03:02 PM
Are you getting out of memory in executor ? If yes, then you can increase the properties like spark.executor.memory, or set argument --executor-memory if using pyspark shell. But one thing I would like to point out here that whenever you are doing map-side aggregate or caching or shuffling in memory, these are taking more memory away from what is need for computations, so tuning your job according to your computation will fix this issue. Below link you can get more spark tuning details.
http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning