Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark app throwing java.lang.OutOfMemoryError: GC overhead limit exceeded

Highlighted

Spark app throwing java.lang.OutOfMemoryError: GC overhead limit exceeded

Expert Contributor

I got a 40 node cdh 5.1 cluster and attempting to run a simple spark app that processes about 10-15GB raw data but I keep running into this error:

 

  java.lang.OutOfMemoryError: GC overhead limit exceeded

 

Each node has 8 cores and 2GB memory. I notice the heap size on the executors is set to 512MB with total set to 2GB. Wanted to know whats the heap size needs to be set to for such data sizes?

 

Thanks for the input!

2 REPLIES 2
Highlighted

Re: Spark app throwing java.lang.OutOfMemoryError: GC overhead limit exceeded

Master Collaborator

Where does the out of memory exception occur? in your driver, or an executor? I assume it is an executor. Yes, you are using the default of 512MB per executor. You can raise that with properties like spark.executor.memory, or flags like --executor-memory if using spark-shell.

 

It sounds like your workers are allocating 2GB for executors, so you could potentially use up to 2GB per executor and your 1 executor per machine would consume all of your Spark cluster memory.

 

But more memory doesn't necessarily help if you're performing some operation that inherently allocates a great deal of memory. I'm not sure what your operations are. Keep in mind too that if you are caching RDDs in memory, this is taking memory away from what's available for computations.

 

Highlighted

Re: Spark app throwing java.lang.OutOfMemoryError: GC overhead limit exceeded

Expert Contributor

Thanks Sean.. I'm currently computing uniques visitors per page and running a count distinct using SparkSQL. We also run the non-spark jobs on the cluster, so if we allocate the 2GB I'm assuming we can't run any other jobs simultaneously. Also, I'm also looking to see how to set the storage levels in CM.

 

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here