Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

killed Spark Streaming job in YARN cluster mode, is listed as incomplete in the History Server

Highlighted

killed Spark Streaming job in YARN cluster mode, is listed as incomplete in the History Server

Expert Contributor

Hi Guys,

I am trying to run the Spark Streaming job, which consumes messages from Kafka topic and do further processing on HDP 2.4.2/Spark 1.6.1.

The application works as per required, but when I kill the streaming job, it is still listed as "incomplete" in the Spark History Server UI. In the YARN UI, it is listed as finished. Also when I check using $yarn application -status <appID> it is listed as killed.

I followed one of the thread on HCC forum and saw that there could be an issue with Yarn time line server, but I restarted all the components of Yarn and MR, also restarted Spark HS and resubmitted the application, but again I face the same problem

Here are the steps.

1) Kinit <userName>

2) Submit the SparkStreaming application

3) Check the required calculations happening using spark history server -> executors-> driver logs (works fine)

4) Get the application id and issue $yarn application -kill <appID> (the same user is killing the app who submitted)

5) Wait for some time ( say 2 mins, waited for 120 mins+ didn't help)

6) Check on the YARN RM ( shows Finished/Killed)

7) Click on the Spark History Server.

8) Ideally this killed application should be listed on the completed application. But at the bottom when we click on the "incomplete" list, we see this application is listed under "incomplete application" table.

9) Going further on HDFS /spark-history directory, it is still shown as <AppID>.inprogress.

( This issue does not occur when we start the same Streaming application in yarn-client mode or other non-streaming applications.)

Will be grateful if anyone can help in understanding the missing bit here.

Thank you,

SS

5 REPLIES 5
Highlighted

Re: killed Spark Streaming job in YARN cluster mode, is listed as incomplete in the History Server

Expert Contributor

Re: killed Spark Streaming job in YARN cluster mode, is listed as incomplete in the History Server

Hi @Smart Solutions

Is this behavior same even if you mention below param in application context?

sparkConf.set(“spark.streaming.stopGracefullyOnShutdown","true")

Highlighted

Re: killed Spark Streaming job in YARN cluster mode, is listed as incomplete in the History Server

Expert Contributor

@Jitendra Yadav , its a NetworkWordCount. However, I did pass config argument.

--conf "spark.streaming.stopGracefullyOnShutdown=true" with and without quotes also. Didn't help.

What do you think are we missing here?

Highlighted

Re: killed Spark Streaming job in YARN cluster mode, is listed as incomplete in the History Server

Hi @Smart Solutions Along with that property, lets try to send SIGTERM signal to the spark driver and rest it will ensure that application get stopped gracefully. You will see graceful shutdown logs in driver logs.

Node where Driver is running.

ps -ef | grep spark | grep <Spark Driver Name>

kill -SIGTERM PID

Highlighted

Re: killed Spark Streaming job in YARN cluster mode, is listed as incomplete in the History Server

Expert Contributor

Hi @Jitendra Yadav,

I am submitting job, killed job using

then try to execute ps -ef | grep spark get pid and issue kill -SIGTERM PID

didn't help.

Don't have an account?
Coming from Hortonworks? Activate your account here