Member since
11-19-2018
1
Post
0
Kudos Received
0
Solutions
11-19-2018
09:40 AM
Hi @srowen I am using CDH 5.15.1 and running the spark-submit to train the model and save the prediction dataframe of the model to HDFS. I am facing this errors when I am trying to save the dataframe to HDFS, 2018-11-19 11:17:33 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Executor heartbeat timed out after 149836 ms
2018-11-19 11:17:33 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Executor heartbeat timed out after 149836 ms
2018-11-19 11:18:07 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Container container_1542123439491_0080_01_000004 exited from explicit termination request.
2018-11-19 11:18:07 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Container container_1542123439491_0080_01_000004 exited from explicit termination request. I have also tried using the spark.yarn.executor.memoryOverhead which I have set that to 10% of the executor-memory mentioned in my spark-submit and still I am seeing this errors. Do you have any suggestions for this issue? Spark-Submit Command: spark-submit-with-zoo.sh --master yarn --deploy-mode cluster --num-executors 8 --executor-cores 16 --driver-memory 300g --executor-memory 400g Main_Final_auc.py 256
... View more