Support Questions

Find answers, ask questions, and share your expertise

Why are there dr.who "MYYARN" applications running and all failing in what seems to be a loop?

avatar
New Contributor

We're running a HDP 2.5 cluster and today we noticed a series of dr.who "MYYARN" applications running, failing, and then resubmitting to YARN again and again. In what seems to be an "infinite loop". We can't figure out what the applications are doing and why they are failing. Any thoughts? Many thanks in advance!

34 REPLIES 34

avatar
Explorer

No. I'm the only user connected. And while my cluster is not kerberized, my Ambari connection is made through HTTPS.

avatar
New Contributor

Our jobs are indeed stuck in "ACCEPTED" status, and then eventually fail due to a time-out. I can't get any further useful log information. Having checked the RM UI logs for "FAILED" jobs, I noticed it started on April 30 for 3 hours straight then stoped. It started again on May 1 up until today.

avatar
Super Collaborator

This is typically the case when the resources are exceeded. This could be the memory of the node, but also the queue itself. Can you check if the jobs getting stuck are all submitted in the same queue?

https://community.hortonworks.com/questions/96750/yarn-application-stuck-in-accepted-state.html

avatar
New Contributor

I'm having the exact same issue. All of a sudden yesterday - on a cluster that has been up and running for weeks - started spawning six of these at a time for no apparent reason. I kill them and they come back. I've poured over every single log, checked every nook and cranny and cannot figure it out. I have no idea where they are coming from. It is most definitely not a resource issue - these jobs shouldn't even be running - and it's not cron. They are sucking up major CPU when it runs.

If anyone has any thoughts I'd be grateful to hear them!

The other odd thing is that in the past I would see one of these jobs - but only one - never like this.

,

I'm having the exact same problem. All of a sudden on a cluster that has not changed is spawning off these jobs that are in ACCEPTED status as user Dr.Who and called MYYARN. I've poured over every single log, bounced my cluster several times, there are no cron jobs and it is most definitely not a resource issue. Looking at old logs it looks like it happened periodically - but only once of twice and then it stops. Yesterday it started running wild and as quick as I kill them off it starts another 6 of the exact same job. If anyone has any insight at all I'd be grateful.

And I'm not even using HDP - this is standard Apache Hadoop/Yarn/Spark 2.7.5

avatar
Contributor

I am wondering if this a security loophole ,since my cluster is not yet kerberized !

avatar
Explorer

I have the same question (for the same reason, ie. not being kerberized yet).

avatar
New Contributor

I'm having the exact same issue. All of a sudden yesterday - on a cluster that has been up and running for weeks - started spawning six of these at a time for no apparent reason. I kill them and they come back. I've poured over every single log, checked every nook and cranny and cannot figure it out. I have no idea where they are coming from. It is most definitely not a resource issue - these jobs shouldn't even be running - and it's not cron. They are sucking up major CPU when it runs.

If anyone has any thoughts I'd be grateful to hear them!

The other odd thing is that in the past I would see one of these jobs - but only one - never like this.

avatar
Contributor

Temporary workaround could be set hadoop.http.staticuser.user=testuser

assign testuser to queue testqueue with 1% resources ?

avatar
Explorer

(this might be the real answer)

It looks like some kind of an attack. I have seen it on 2 clusters, 1 running HDP and 1 running Hadoop 2.7.4..

avatar
Explorer

Using iptables firewall, I blocked port 8088 and the situiation improved. Too soon to tell if this is a real fix.