First of all, sorry if this is not the right board to post this, it's the only one that reffers to Yarn.
When using yarn application kill on spark jobs in a CDH 5.7.0 cluster, the application dissapears from Yarn but the process is still running in the Linux, even a couple of hours later.
I have a few questions:
1. Is it ok to leave the spark process running?
2. Considering it does not appear in Yarn anymore, is it still consuming resources?
3. Will the process in Linux end at some point
4. Is there a yarn command that will force the process to close in the OS as well?
5. This one is actually more questions into one:
I'm using a bash script which is run by a cron job to start a spark job which I want to kill a day later. When I start the job (spark-submit -master yarn-cluster ...& ) I can get the PID of the process and store it in a file.
Can I trace the Yarn Application ID from this PID the next day?
Should I use the value from the --name parameter too look for the Application ID?
Or just use the PID to kill the process from the OS level?