Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark-structured Streaming job fails with GC overhead limit exceeded and max executor fails (16) reached Errors after 48 hrs of run

Spark-structured Streaming job fails with GC overhead limit exceeded and max executor fails (16) reached Errors after 48 hrs of run

New Contributor

Hello Experts,

Spark Structure streaming job fails with java.lang.OutofMemoryError: java heap space in executor and executor failed.

job fuction--reads the records from kafka(5 partitions) and left joins with 2 tables (50 records each) insert into mongo using foreach writer

input records- (avg) 10-15k rec/sec which is size of 30-40 KB

Find HDP-cluster yarn resources details below

3 nodes -15 GB of RAM & 5 Cores each node

spark-version -2.3.1

Find my spark-submit command below, with these below configs, job can run 48 hrs (2 times yarn application attempt)

./bin/spark-submit --class <class-NM> \

--master yarn --deploy-mode cluster \

--conf spark.yarn.maxAppAttempts=4 \

--conf spark.yarn.am.attemptFailuresValidityInterval=1h \

--conf spark.yarn.max.executor.failures=16 \

--conf spark.yarn.executor.failuresValidityInterval=1h \

--conf spark.executor.heartbeatInterval=1000000 \

--conf spark.network.timeout=10000000 \

--conf spark.task.maxFailures=8 \

--conf spark.speculation=true \

--num-executors 2 --driver-memory 5g --executor-memory 6g --executor-cores 2 <jar> <params>


Have followed the below article to submit Streaming jobs http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/

what are the configurations/parameters need to look for Streaming Jobs? and i observed that executors life time decreased slowly. i.e, 1 executor fails in 2 hrs , 2nd fails in 1.5 hrs, 3rd fails in 40 mints,....etc? why is it so ?


Please help me to fix this.

Thanks,

Sudhakar.



1 REPLY 1

Re: Spark-structured Streaming job fails with GC overhead limit exceeded and max executor fails (16) reached Errors after 48 hrs of run

New Contributor

Sure Anna.


I have a HDP cluster of 3 nodes(1 master and two data nodes).

Node 1:

RAM - 32GB with 8 cores

of which YARN uses 15GB and 5 cores


Node 2:

RAM - 16GB with 8 cores

of which YARN uses 15GB and 5 cores


Node 3:

RAM - 16GB with 8 cores

of which YARN uses 15GB and 5 cores


Iam running structured streaming job with spark version 2.3.1. The job reads data from Kafka and inserts them into Mongo DB.

The job does a simple functionality of reading messages from Kafka and joins(left) with other two tables and inserts into Mongo.

The rate at which records are pushed to Kafka is 8 to 10 KB per second.


The job runs fine for 48 hrs but fails with GC overhead limit exceeded and max executor fails (16).

Attached the screenshot of Executors, DAG, Error Message seen in Spark UI. Please refer to spark configuration(spark-submit) provided in my previous post.


Please help resolving the issue.

sparkErr.JPG sparkUI.jpg Job_DAG.JPG