Created 11-28-2017 06:28 AM
I am tryting to start nodemanager but it is getting stopped , I have tried to see its log (/var/log/hadoop-yarn/yarn ) in the same machine in which nodemanger is installed but there are no logs available. In ambari I tried to see stderr: /var/lib/ambari-agent/data/errors-2756.txt
and getting some error like :
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/mount.py", line 41, in get_mounted raise Fail("Getting list of mounts (calling mount) failed") resource_management.core.exceptions.Fail: Getting list of mounts (calling mount) failed
i even checked yarn.nodemanager.log-dirs :
/usr/hadoop/yarn/log,/var/run/hadoop/yarn/log,/opt/hadoop/yarn/log, /var/log/hadoop/yarn/log,/data/hadoop/yarn/log
but getting No such file or directory in the same machine in which nodemanager is installed .
out of four nodemanager only one node manager is running and its logs are available in /var/log/hadoop-yarn/yarn but for other there are no logs.
Created 11-28-2017 10:59 AM
Based on Error it looks like ambari agent is not able to determine the Mount directories on the failing host.
.
For a quick test please write following kind of simple python script "/tmp/mountTest.py" to list the mount directories (this is exactly same what the ambari-agent does).
NOTE: please make sure that indentation is same as below inside the file as Python is very sensitive to indentations.
import os import re from subprocess import Popen, PIPE, STDOUT p = Popen("mount", stdout=PIPE, stderr=STDOUT, shell=True) out = p.communicate()[0] if p.wait() != 0: raise Fail("Getting list of mounts (calling mount) failed") print 'All Good' + out
.
The run the script as following:
NOTE: Please make sure that the python is running with the same user account who is running the ambari-agent,.
# python /tmp/mountTest.py All Good/dev/vda1 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
.
Created 11-28-2017 07:37 AM
i tried to manually starting nodemanager login as root :
cd /usr/hdp/current/hadoop-yarn-nodemanager/sbi n/
sh yarn-daemon.sh start nodemanger
and nodemanager started but when doing same with ambari it is not able to start.
Created 11-28-2017 08:16 AM
What is the error you get when you try restarting Ambari? check out in
/var/log/ambari-server/ambari-server.log
Please attach the log
Created 11-28-2017 10:47 AM
@Jay Kumar SenSharma can you please look into issue ?
Created 11-28-2017 10:59 AM
Based on Error it looks like ambari agent is not able to determine the Mount directories on the failing host.
.
For a quick test please write following kind of simple python script "/tmp/mountTest.py" to list the mount directories (this is exactly same what the ambari-agent does).
NOTE: please make sure that indentation is same as below inside the file as Python is very sensitive to indentations.
import os import re from subprocess import Popen, PIPE, STDOUT p = Popen("mount", stdout=PIPE, stderr=STDOUT, shell=True) out = p.communicate()[0] if p.wait() != 0: raise Fail("Getting list of mounts (calling mount) failed") print 'All Good' + out
.
The run the script as following:
NOTE: Please make sure that the python is running with the same user account who is running the ambari-agent,.
# python /tmp/mountTest.py All Good/dev/vda1 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
.
Created 11-28-2017 11:27 AM
i ran same mount test script :
here is below result
Traceback (most recent call last): File "/tmp/MountTest.py", line 8, in <module> raise Fail("Getting list of mounts (calling mount) failed") NameError: name 'Fail' is not defined
Created 11-28-2017 01:25 PM
i have restarted ambari-agent but when i start nodemanager now getting an error :
Execution of 'ambari-sudo.sh su yarn -l -s /bin/bash -c 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid && ps -p `cat /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid`'' returned 1. /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid
Created 11-28-2017 01:33 PM
Can you make a small change to @Jay Kumar SenSharma's script as below and run the script and paste the output
from subprocess import Popen, PIPE, STDOUT p = Popen("mount", stdout=PIPE, stderr=STDOUT, shell=True) out,err = p.communicate() print "return code is :: " + str(p.returncode) if out: print "stdout is :: " + out if err: print "stderr is :: " + err