Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Unable to start nodemanager

avatar

I am tryting to start nodemanager but it is getting stopped , I have tried to see its log (/var/log/hadoop-yarn/yarn ) in the same machine in which nodemanger is installed but there are no logs available. In ambari I tried to see stderr: /var/lib/ambari-agent/data/errors-2756.txt

and getting some error like :

File "/usr/lib/python2.6/site-packages/resource_management/core/providers/mount.py", line 41, in get_mounted
    raise Fail("Getting list of mounts (calling mount) failed")
resource_management.core.exceptions.Fail: Getting list of mounts (calling mount) failed

i even checked yarn.nodemanager.log-dirs :

/usr/hadoop/yarn/log,/var/run/hadoop/yarn/log,/opt/hadoop/yarn/log, /var/log/hadoop/yarn/log,/data/hadoop/yarn/log

but getting No such file or directory in the same machine in which nodemanager is installed .

out of four nodemanager only one node manager is running and its logs are available in /var/log/hadoop-yarn/yarn but for other there are no logs.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Anurag Mishra

Based on Error it looks like ambari agent is not able to determine the Mount directories on the failing host.

https://github.com/apache/ambari/blob/release-2.6.0/ambari-common/src/main/python/resource_managemen...

.

For a quick test please write following kind of simple python script "/tmp/mountTest.py" to list the mount directories (this is exactly same what the ambari-agent does).

NOTE: please make sure that indentation is same as below inside the file as Python is very sensitive to indentations.

import os
import re
from subprocess import Popen, PIPE, STDOUT

p = Popen("mount", stdout=PIPE, stderr=STDOUT, shell=True)
out = p.communicate()[0]
if p.wait() != 0:
    raise Fail("Getting list of mounts (calling mount) failed")
print 'All Good' + out

.

The run the script as following:

NOTE: Please make sure that the python is running with the same user account who is running the ambari-agent,.

# python /tmp/mountTest.py

All Good/dev/vda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

.

View solution in original post

7 REPLIES 7

avatar

i tried to manually starting nodemanager login as root :

cd /usr/hdp/current/hadoop-yarn-nodemanager/sbi n/

sh yarn-daemon.sh start nodemanger

and nodemanager started but when doing same with ambari it is not able to start.

avatar
Master Mentor

@Anurag Mishra

What is the error you get when you try restarting Ambari? check out in

/var/log/ambari-server/ambari-server.log

Please attach the log

avatar

@Jay Kumar SenSharma can you please look into issue ?

avatar
Master Mentor

@Anurag Mishra

Based on Error it looks like ambari agent is not able to determine the Mount directories on the failing host.

https://github.com/apache/ambari/blob/release-2.6.0/ambari-common/src/main/python/resource_managemen...

.

For a quick test please write following kind of simple python script "/tmp/mountTest.py" to list the mount directories (this is exactly same what the ambari-agent does).

NOTE: please make sure that indentation is same as below inside the file as Python is very sensitive to indentations.

import os
import re
from subprocess import Popen, PIPE, STDOUT

p = Popen("mount", stdout=PIPE, stderr=STDOUT, shell=True)
out = p.communicate()[0]
if p.wait() != 0:
    raise Fail("Getting list of mounts (calling mount) failed")
print 'All Good' + out

.

The run the script as following:

NOTE: Please make sure that the python is running with the same user account who is running the ambari-agent,.

# python /tmp/mountTest.py

All Good/dev/vda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

.

avatar

@Jay Kumar SenSharma

i ran same mount test script :

here is below result

Traceback (most recent call last): File "/tmp/MountTest.py", line 8, in <module> raise Fail("Getting list of mounts (calling mount) failed") NameError: name 'Fail' is not defined

avatar

i have restarted ambari-agent but when i start nodemanager now getting an error :

Execution of 'ambari-sudo.sh su yarn -l -s /bin/bash -c 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid && ps -p `cat /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid`'' returned 1. /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid

avatar
Super Guru

@Anurag Mishra,

Can you make a small change to @Jay Kumar SenSharma's script as below and run the script and paste the output

from subprocess import Popen, PIPE, STDOUT
p = Popen("mount", stdout=PIPE, stderr=STDOUT, shell=True)
out,err = p.communicate()
print "return code is :: " + str(p.returncode)
if out:
    print "stdout is :: " + out
if err:
    print "stderr is :: " + err