Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to start nodemanager

avatar

I am tryting to start nodemanager but it is getting stopped , I have tried to see its log (/var/log/hadoop-yarn/yarn ) in the same machine in which nodemanger is installed but there are no logs available. In ambari I tried to see stderr: /var/lib/ambari-agent/data/errors-2756.txt

and getting some error like :

File "/usr/lib/python2.6/site-packages/resource_management/core/providers/mount.py", line 41, in get_mounted
    raise Fail("Getting list of mounts (calling mount) failed")
resource_management.core.exceptions.Fail: Getting list of mounts (calling mount) failed

i even checked yarn.nodemanager.log-dirs :

/usr/hadoop/yarn/log,/var/run/hadoop/yarn/log,/opt/hadoop/yarn/log, /var/log/hadoop/yarn/log,/data/hadoop/yarn/log

but getting No such file or directory in the same machine in which nodemanager is installed .

out of four nodemanager only one node manager is running and its logs are available in /var/log/hadoop-yarn/yarn but for other there are no logs.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Anurag Mishra

Based on Error it looks like ambari agent is not able to determine the Mount directories on the failing host.

https://github.com/apache/ambari/blob/release-2.6.0/ambari-common/src/main/python/resource_managemen...

.

For a quick test please write following kind of simple python script "/tmp/mountTest.py" to list the mount directories (this is exactly same what the ambari-agent does).

NOTE: please make sure that indentation is same as below inside the file as Python is very sensitive to indentations.

import os
import re
from subprocess import Popen, PIPE, STDOUT

p = Popen("mount", stdout=PIPE, stderr=STDOUT, shell=True)
out = p.communicate()[0]
if p.wait() != 0:
    raise Fail("Getting list of mounts (calling mount) failed")
print 'All Good' + out

.

The run the script as following:

NOTE: Please make sure that the python is running with the same user account who is running the ambari-agent,.

# python /tmp/mountTest.py

All Good/dev/vda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

.

View solution in original post

7 REPLIES 7

avatar

i tried to manually starting nodemanager login as root :

cd /usr/hdp/current/hadoop-yarn-nodemanager/sbi n/

sh yarn-daemon.sh start nodemanger

and nodemanager started but when doing same with ambari it is not able to start.

avatar
Master Mentor

@Anurag Mishra

What is the error you get when you try restarting Ambari? check out in

/var/log/ambari-server/ambari-server.log

Please attach the log

avatar

@Jay Kumar SenSharma can you please look into issue ?

avatar
Master Mentor

@Anurag Mishra

Based on Error it looks like ambari agent is not able to determine the Mount directories on the failing host.

https://github.com/apache/ambari/blob/release-2.6.0/ambari-common/src/main/python/resource_managemen...

.

For a quick test please write following kind of simple python script "/tmp/mountTest.py" to list the mount directories (this is exactly same what the ambari-agent does).

NOTE: please make sure that indentation is same as below inside the file as Python is very sensitive to indentations.

import os
import re
from subprocess import Popen, PIPE, STDOUT

p = Popen("mount", stdout=PIPE, stderr=STDOUT, shell=True)
out = p.communicate()[0]
if p.wait() != 0:
    raise Fail("Getting list of mounts (calling mount) failed")
print 'All Good' + out

.

The run the script as following:

NOTE: Please make sure that the python is running with the same user account who is running the ambari-agent,.

# python /tmp/mountTest.py

All Good/dev/vda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

.

avatar

@Jay Kumar SenSharma

i ran same mount test script :

here is below result

Traceback (most recent call last): File "/tmp/MountTest.py", line 8, in <module> raise Fail("Getting list of mounts (calling mount) failed") NameError: name 'Fail' is not defined

avatar

i have restarted ambari-agent but when i start nodemanager now getting an error :

Execution of 'ambari-sudo.sh su yarn -l -s /bin/bash -c 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid && ps -p `cat /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid`'' returned 1. /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid

avatar
Super Guru

@Anurag Mishra,

Can you make a small change to @Jay Kumar SenSharma's script as below and run the script and paste the output

from subprocess import Popen, PIPE, STDOUT
p = Popen("mount", stdout=PIPE, stderr=STDOUT, shell=True)
out,err = p.communicate()
print "return code is :: " + str(p.returncode)
if out:
    print "stdout is :: " + out
if err:
    print "stderr is :: " + err