Created 05-02-2018 09:19 AM
We're running a HDP 2.5 cluster and today we noticed a series of dr.who "MYYARN" applications running, failing, and then resubmitting to YARN again and again. In what seems to be an "infinite loop". We can't figure out what the applications are doing and why they are failing. Any thoughts? Many thanks in advance!
Created 05-02-2018 09:48 AM
I'm hitting exactly the same issue here with HDP 2.6.
It looks like some kind of DOS attack but I have no clue on how to handle this?
Any help from Hortonworks would be appreciated.
Created 05-02-2018 10:05 AM
I am also facing issue with dr.who user with MYYARN application submitting in loop. But those are staying in "ACCEPTED" status. Total 18 application are launched. no clue !!
Created 05-02-2018 10:51 AM
Mine are also staying in "ACCEPTED" status. And they're launched every 3 seconds... It's becoming a problem on the ResourceManager. When I look at the logs, I can only see actions coming from within my cluster.
Created 05-02-2018 11:06 AM
did you check this:
https://community.hortonworks.com/questions/2349/tip-when-you-get-a-message-in-job-log-user-dr-who.h...
The solution there was:
RESOLUTION Customer changed the following property in core-site.xml to resolve the issue. Other values such as hdfs or mapred also resolve the issue. If the cluster is managed by Ambari, this should be added in Ambari > HDFS > Configurations>Advanced core-site > Add Property
hadoop.http.staticuser.user=yarn
Created 05-02-2018 11:52 AM
Yeah I was thinking to change static user but this was not there until today afternoon. Its suddenly started spawning applications using dr.user.
Created 05-02-2018 11:55 AM
I did the change but it didn't change anything. Instead of being 'dr.who', it's now 'yarn' user that is feeding applications every 3 seconds that get stuck as "ACCEPTED". I still can't find how these applications are being triggered. Any other clue?
Created 05-02-2018 12:13 PM
can you check if any cronjob triggering this ? crontab -l !
Created 05-02-2018 12:26 PM
Thanks but I did check beforehand and there was no crontab whatsoever running for any user on any machine in my cluster.
Created 05-02-2018 01:23 PM
and no user connected to the system to start jobs in your cluster?