Java heap space issues while running spark jobs

Hello Team,


can someone please help me, i am facing out of memory in spark jobs also find below parameters and also find below configurations.


21/10/11 17:22:53 INFO executor.Executor: Finished task 194.0 in stage 79.0 (TID 14855). 11767 bytes result sent to driver
21/10/11 17:23:34 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
21/10/11 17:23:34 ERROR executor.Executor: Exception in task 167.0 in stage 79.0 (TID 14825)
java.lang.OutOfMemoryError: Java heap spac


21/10/11 17:23:34 INFO storage.DiskBlockManager: Shutdown hook called
21/10/11 17:23:34 INFO util.ShutdownHookManager: Shutdown hook called
21/10/11 17:23:34 INFO executor.Executor: Not reporting error to driver during JVM shutdown.
21/10/11 17:23:34 ERROR util.SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[Executor task la
java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.copy(
at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.add(ExternalAppendOnlyUnsafeRowArray.scala:108)


Job Configurations details:

conf = { "app_name": "CX360", "spark.yarn.queue": "CXMT",

            "spark.port.maxRetries": 500,

            "spark.driver.memoryOverhead": 4096,

            "spark.executor.memoryOverhead": '14g',

            "spark.driver.memory": "50g",

            "spark.driver.maxResultSize": 0,

            "spark.executor.memory": "50g",

            "spark.executor.instances": 2,

            "spark.executor.cores": 5,

            "spark.driver.cores": 5



Tried with different values but still facing issues, please find yarn queue capacity details


Queue Name : CXMT
Queue State : running
Scheduling Info : Capacity: 8.0, MaximumCapacity: 8.0, CurrentCapacity: 0.8620696


Please do the  needfull ASAP.




Is this is only the spark job failing with OOM error? what was the initial executor and driver memory that you have tried with?

Can you also try to into increase the num-executors and executor-cores and run the job? rerun the job by increasing executors and cores and see if it works.



Chethan YM

