Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Facing error while compiling some of the codes in Pyspark

New Contributor

Issue: - Facing error while compiling some of the codes in Pyspark

Errors: -

GC overhead limit exceeded

java.lang.OutOfMemoryError: Java heap space

Please find below the information which i have collected from the current yarn-site.xml & mapred-site.xml

yarn.scheduler.maximum-allocation-mb - 65536

yarn.scheduler.minimum-allocation-mb - 4096

mapreduce.map.java.opts = Xmx2048m mapreduce.map.memory.mb = 2560 mapreduce.reduce.java.opts = Xmx4096m mapreduce.reduce.memory.mb = 5120

1 REPLY 1

Expert Contributor

@Koushik Dey

Are you getting out of memory in executor ? If yes, then you can increase the properties like spark.executor.memory, or set argument --executor-memory if using pyspark shell. But one thing I would like to point out here that whenever you are doing map-side aggregate or caching or shuffling in memory, these are taking more memory away from what is need for computations, so tuning your job according to your computation will fix this issue. Below link you can get more spark tuning details.

http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.