Support Questions

Find answers, ask questions, and share your expertise

Spark job aborted due to java.lang.OutOfMemoryError: Java heap space

avatar
Contributor

Hi,

I have Created HDP 2.6 on AWS with master node(m4.2xlarge) and 4 worker nodes(m4.xlarge). I want to process 4GB log file using Spark job but i am getting below error while executing Spark Job :

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3236)

I have configured spark-env.sh file for master node :

SPARK_EXECUTOR_MEMORY="5G"

SPARK_DRIVER_MEMORY="5G"

but it throws the same error. I also configured worker nodes with those settings and increase Java heap size for hadoop client,Resource Manager,Node Manager and for YARN still spark job aborted.

Thanks,

7 REPLIES 7

avatar

Hey @priyal patel!
Could you share your spark-submit parameters

avatar
Contributor

Hi, @Vinicius Higa Murakami

I want to process 4 GB file so I have configured executor memory to 10 gb and number of executors to 10 in spark-env.sh file.Here is the spark-submit parameters :

./bin/spark-submit --class org.apache.TransformationOper --master local[2] /root/spark/TransformationOper.jar /Input/error.log

I tried to set configuration manually using below spark-submit parameters :

./bin/spark-submit --driver-memory 5g --num-executors 10 --executor-memory 10g --class org.apache.TransformationOper --master local[2] /root/spark/TransformationOper.jar

And set master as a yarn-cluster still got the OutOfMemoryError error.

avatar

Hey @priyal patel!
Do you know how much is set for

spark.driver.memoryOverhead
spark.executor.memoryOverhead

Also, do you mind to share your OOM error?

avatar

@priyal patel First make sure you know if OOM is happening on driver or in executor. You can find this by looking at the logs. To test I suggest you increase the --driver-memory to 10g or even 20g and see what happens. Also try running on yarn-client mode instead of yarn-cluster. If OOM error comes on the sdtout of spark-submit you will know the driver is running out of memory. Else you can check the yarn logs -applicationId <appId> to see what happened on the executor side.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar

@priyal patel Increasing driver memory seems to help then. If OOM issue is no longer happening then I recommend you open a separate thread for the performance issue. On any case to see why is taking long you can check the Spark UI and see what job/task is taking time and on which node. Then you can also review the logs for more information yarn logs -applicationId <appId>

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Contributor

Hi, @Felix Albani

I set driver memory to 20 GB.I tried using below spark-submit parameters :

./bin/spark-submit --driver-memory 20g --executor-cores 3 --num-executors 20 --executor-memory 2g --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.yarn.driver.memoryOverhead=1024 --class org.apache.TransformationOper --master yarn-cluster /home/hdfs/priyal/spark/TransformationOper.jar

Cluster configuration is : 1 Master node(r3.xlarge) and 1 worker node(r3.xlarge) : 4 vCPUs, 30GB memory,40 GB storage

Still getting the same issue spark job is in running state and YARN memory is 95% used.

avatar
Contributor

Hi, @Vinicius Higa Murakami , @Felix Albani

I have set spark.yarn.driver.memoryOverhead=1 GB,spark.yarn.executor.memoryOverhead=1 GB and spark_driver_memory=12 GB. I have set storage level to MEMORY_AND_DISK_SER().

Hadoop Cluster configuration is : 1 Master node(r3.xlarge) and 1 worker node (m4.xlarge).

Here is the spark-submit parameter :

./bin/spark-submit --driver-memory 12g --executor-cores 2 --num-executors 3 --executor-memory 3g --class org.apache.TransformationOper --master yarn-cluster /spark/TransformationOper.jar

Spark job entered into running state but it has been executing for last one hour still execution not completed.