I am trying to run the Spark Streaming job, which consumes messages from Kafka topic and do further processing on HDP 2.4.2/Spark 1.6.1.
The application works as per required, but when I kill the streaming job, it is still listed as "incomplete" in the Spark History Server UI. In the YARN UI, it is listed as finished. Also when I check using $yarn application -status <appID> it is listed as killed.
I followed one of the thread on HCC forum and saw that there could be an issue with Yarn time line server, but I restarted all the components of Yarn and MR, also restarted Spark HS and resubmitted the application, but again I face the same problem
Here are the steps.
1) Kinit <userName>
2) Submit the SparkStreaming application
3) Check the required calculations happening using spark history server -> executors-> driver logs (works fine)
4) Get the application id and issue $yarn application -kill <appID> (the same user is killing the app who submitted)
5) Wait for some time ( say 2 mins, waited for 120 mins+ didn't help)
6) Check on the YARN RM ( shows Finished/Killed)
7) Click on the Spark History Server.
8) Ideally this killed application should be listed on the completed application. But at the bottom when we click on the "incomplete" list, we see this application is listed under "incomplete application" table.
9) Going further on HDFS /spark-history directory, it is still shown as <AppID>.inprogress.
( This issue does not occur when we start the same Streaming application in yarn-client mode or other non-streaming applications.)
Will be grateful if anyone can help in understanding the missing bit here.
@Jitendra Yadav , its a NetworkWordCount. However, I did pass config argument.
--conf "spark.streaming.stopGracefullyOnShutdown=true" with and without quotes also. Didn't help.
What do you think are we missing here?
Hi @Smart Solutions Along with that property, lets try to send SIGTERM signal to the spark driver and rest it will ensure that application get stopped gracefully. You will see graceful shutdown logs in driver logs.
Node where Driver is running.
ps -ef | grep spark | grep <Spark Driver Name>
kill -SIGTERM PID