Created 06-13-2018 01:01 PM
I admin a small Hortonworks Hadoop cluster which runs consistently two custom spark applications, each in separate queue (dev and prod). Default queue is used for services like thrift server / zeppelin etc. Last two weeks some process submits several applications each hour to the root.default queue on behalf of user hadoop, getsss and some others. There are more 12 thousand apps submitted to this date. Node Managers die after several minutes if I restart YARN or the whole cluster.
I know nothing about these applications. They are all either in ACCEPTED or FAILED state. What bothers me is:
1. Who (which service or user, from which machine) keeps submitting these apps? The cluster is hosted in cloud in internal network and accessible only via several edge gateway forwarded ports (ssh and web-ui of hue, ambari, yarn-ui and zeppelin)
2. How do I stop this from happening? I see following solutions:
- block default queue from any submissions after cluster startup and clear the default queue
I didn't find a decent way to clean queue from 12000 apps at a time, and it takes ages to kill them one-by-one.
I still have to reopen the queue to restart e.g. zeppelin, and blocking it each time seems a bad idea
- delete default queue and reconfigure all services to use other queue as default
Also seems like a painful and ugly solution
- find out what submits apps and kill it with fire
I still have to clear the queue, so the last question is
3. How do I clear the queue from this mess?
Thanks for help in advance!
Created 06-13-2018 06:35 PM
hello,
last month i has same trouble in my cluster.
the temporary solution has been To block the ressource manager port 8088.
however, this is not definitive solution.
regards
Created 06-13-2018 02:07 PM
You should be able to kill al the queue job with this script:
for app in `yarn application -list | awk '$6 == "ACCEPTED" { print $1 }'`; do yarn application -kill "$app"; done
Just put in a scri[t .sh and run it wit ha user that are allow to kill app
Best regards,
Michel
Created 06-13-2018 02:23 PM
Hi! Thank you for the script, it solves the part about messy default queue, but even if I clear it, new applications get submitted instead of old killed ones. Dirty solution would be to put your script on cron, but I want to stop receiving them for good. I found out, that they are all submitted on behalf of user dr.who - Yarn ui and web hdfs user. What may be the cause of so much apps submitted from this particular user? Can I block this user from submitting apps or my Yarn ui will stop working? If I can, how do I block a user from submitting to the queue?
Created 06-13-2018 02:38 PM
The following link describe how you can secure yarn queue to be sure that only specific user can submit job to specific queue, it done with Ranger:
https://community.hortonworks.com/articles/10797/apache-ranger-and-yarn-setup-security.html
Normaly if you are in a kerberos environment, you should not have job running as dr who
Miche
Created 06-13-2018 06:35 PM
hello,
last month i has same trouble in my cluster.
the temporary solution has been To block the ressource manager port 8088.
however, this is not definitive solution.
regards
Created 06-13-2018 09:18 PM
@sidoine kakeuh fosso Thank you, I actually did it earlier today and it stopped those "spam" apps. Still don't know the source, the only guess I have about it is that somebody discovered our public IP address and practically DDOSed yarn for some reason.
Did you investigate your yarn/hive/webhcat logs for any alien IPs or queries? Have you managed to find something?
I tried but gave up, no trace of who that might be.
Anyways, thanks for your answer, this is the closest to definitive solution.
Best regards
Created 07-09-2018 10:34 AM
Hope you are doing great. I just would like to know, Is your resolved and how? whereas i too had the same issue and fed up completely with it.
Thanks in advance,
Sanjay