- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Yarn application kill does not end the process in OS
- Labels:
-
Apache Spark
-
Apache YARN
Created on ‎01-23-2017 06:12 AM - edited ‎09-16-2022 03:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all, sorry if this is not the right board to post this, it's the only one that reffers to Yarn.
When using yarn application kill on spark jobs in a CDH 5.7.0 cluster, the application dissapears from Yarn but the process is still running in the Linux, even a couple of hours later.
I have a few questions:
1. Is it ok to leave the spark process running?
2. Considering it does not appear in Yarn anymore, is it still consuming resources?
3. Will the process in Linux end at some point
4. Is there a yarn command that will force the process to close in the OS as well?
5. This one is actually more questions into one:
I'm using a bash script which is run by a cron job to start a spark job which I want to kill a day later. When I start the job (spark-submit -master yarn-cluster ...& ) I can get the PID of the process and store it in a file.
Can I trace the Yarn Application ID from this PID the next day?
Should I use the value from the --name parameter too look for the Application ID?
Or just use the PID to kill the process from the OS level?
Thank you,
Created ‎01-23-2017 07:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
spark.yarn.submit.waitAppCompletion (default: true) is a flag to control whether to wait for the application to finish before exiting the launcher process in cluster mode
Example
spark-submit --class SparkBatchLogs --conf "spark.yarn.submit.waitAppCompletion=false" --master yarn-cluster SparkTest-0.0.1-SNAPSHOT.jar
Created on ‎01-23-2017 08:58 AM - edited ‎01-23-2017 09:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see, can you please confirm if I understand this correctly:
--deploy-mode client means that the driver is "outside" of Yarn and if I kill the application in Yarn, the exectors are deallocated but the driver process remains active (PID in Linux still active). However if I kill the driver from the OS then Yarn deallocates the executors
--deploy-mode cluster means that the driver is "inside" Yarn and if I kill the application in Yarn, both the executors and the driver are deallocated by Yarn but if I kill the launcher process from OS then neither driver nor executor are affected (just checked this situation and the spark instance "survives" the kill launcher command).
Now the question that remains is how do I find out the application id from Yarn starting from the name or, can I "export" the Application ID that yarn assignes when I launch the job?
Thank you
Created ‎01-23-2017 08:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
