Created on 08-04-2014 10:38 AM - edited 09-16-2022 02:04 AM
I got a 40 node cdh 5.1 cluster and attempting to run a simple spark app that processes about 10-15GB raw data but I keep running into this error:
java.lang.OutOfMemoryError: GC overhead limit exceeded
Each node has 8 cores and 2GB memory. I notice the heap size on the executors is set to 512MB with total set to 2GB. Wanted to know whats the heap size needs to be set to for such data sizes?
Thanks for the input!