Sure Anna. I have a HDP cluster of 3 nodes(1 master and two data nodes). Node 1: RAM - 32GB with 8 cores of which YARN uses 15GB and 5 cores Node 2: RAM - 16GB with 8 cores of which YARN uses 15GB and 5 cores Node 3: RAM - 16GB with 8 cores of which YARN uses 15GB and 5 cores Iam running structured streaming job with spark version 2.3.1. The job reads data from Kafka and inserts them into Mongo DB. The job does a simple functionality of reading messages from Kafka and joins(left) with other two tables and inserts into Mongo. The rate at which records are pushed to Kafka is 8 to 10 KB per second. The job runs fine for 48 hrs but fails with GC overhead limit exceeded and max executor fails (16). Attached the screenshot of Executors, DAG, Error Message seen in Spark UI. Please refer to spark configuration(spark-submit) provided in my previous post. Please help resolving the issue. sparkErr.JPG sparkUI.jpg Job_DAG.JPG
... View more
Hello Experts, Spark Structure streaming job fails with java.lang.OutofMemoryError: java heap space in executor and executor failed. job fuction--reads the records from kafka(5 partitions) and left joins with 2 tables (50 records each) insert into mongo using foreach writer input records- (avg) 10-15k rec/sec which is size of 30-40 KB Find HDP-cluster yarn resources details below 3 nodes -15 GB of RAM & 5 Cores each node spark-version -2.3.1 Find my spark-submit command below, with these below configs, job can run 48 hrs (2 times yarn application attempt) ./bin/spark-submit --class <class-NM> \ --master yarn --deploy-mode cluster \ --conf spark.yarn.maxAppAttempts=4 \ --conf spark.yarn.am.attemptFailuresValidityInterval=1h \ --conf spark.yarn.max.executor.failures=16 \ --conf spark.yarn.executor.failuresValidityInterval=1h \ --conf spark.executor.heartbeatInterval=1000000 \ --conf spark.network.timeout=10000000 \ --conf spark.task.maxFailures=8 \ --conf spark.speculation=true \ --num-executors 2 --driver-memory 5g --executor-memory 6g --executor-cores 2 <jar> <params> Have followed the below article to submit Streaming jobs http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/ what are the configurations/parameters need to look for Streaming Jobs? and i observed that executors life time decreased slowly. i.e, 1 executor fails in 2 hrs , 2nd fails in 1.5 hrs, 3rd fails in 40 mints,....etc? why is it so ? Please help me to fix this. Thanks, Sudhakar.
... View more