Support Questions

Find answers, ask questions, and share your expertise

Unable to start the node manager

avatar

Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 102, in <module>
Nodemanager().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 351, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 53, in start
service('nodemanager',action='start')
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/service.py", line 93, in service
Execute(daemon_cmd, user = usr, not_if = check_process)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.1.0-187/hadoop/libexec && /usr/hdp/3.0.1.0-187/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.1.0-187/hadoop/conf --daemon start nodemanager' returned 1. -bash: line 0: ulimit: core file size: cannot modify limit: Operation not permitted
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1847: /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid: Permission denied
ERROR: Cannot write nodemanager pid /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid.
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1866: /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager

15 REPLIES 15

avatar

@Shelton any update on this 

avatar
Master Mentor

@saivenkatg55 

Sorry festive period, can you do the following.

Delete old messages in /var/log/messages all that have the extension /var/log/messages.x that should leave you with only one /var/log/messages then truncate that file so you will have only new entries

# truncate --size 0 /var/log/messages

Do the same for /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log.x and also truncate the /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log

# truncate --size 0 /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log


Start manually the node manager

# su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"


Then share the latest files created below

  • /var/log/messages
  • /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log
  • /var/lib/ambari-agent/data/errors-xxx.txt

Please revert 

avatar
  • @Shelton Please check your mail. Kindly check and update me 

avatar
Master Mentor

@saivenkatg55 

 

I see in the hadoop-yarn-nodemanager-w0lxdhdp05.ifc.org.log errors pointing to "Unable to start NodeManager: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6279667856305652637.8 (Permission denied)]

 

My suspicion:

 

Please verify that /tmp on the host does not have the noexec option set. You can verify this by running /bin/mount and checking the mount options. If you are able to, remount /tmp without noexec and try starting the NodeManager again. I am sure its issue with noexec on /tmp.

See my sample output

[root@tokyo ~]# /bin/mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=7167976k,nr_inodes=1791994,mode=755)
.......
...
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15609)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
/dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
/dev/sda5 on /opt type ext4 (rw,relatime,data=ordered)
/dev/sda8 on /home type ext4 (rw,relatime,data=ordered)
/dev/sda11 on /u02 type ext4 (rw,relatime,data=ordered)
/dev/sda6 on /var type ext4 (rw,relatime,data=ordered)
/dev/sda10 on /u01 type ext4 (rw,relatime,data=ordered)
/dev/sda9 on /tmp type ext4 (rw,relatime,data=ordered)


This issue occurs when the user running the Hadoop [Nodemanager start] process does not have the necessary rights and cannot generate temporary files under the /tmp directory.

 

Solution

- Allow the user running node manager startup process read/write/execute access on /tmp
- Remove the noexec parameter when mounting /tmp
- Change the execution rights on /tmp. ie: sudo chmod 777 /tmp

 

In the /var/log/messages I  can also see
Jan 2 05:14:23 w0lxdhdp05 abrt-server: Package 'ambari-agent' isn't signed with proper key
Jan 2 05:14:23 w0lxdhdp05 abrt-server: 'post-create' on '/var/spool/abrt/Python-2020-01-02-05:14:22-11897' exited with 1
Jan 2 05:14:23 w0lxdhdp05 abrt-server: Deleting problem directory '/var/spool/abrt/Python-2020-01-02-05:14:22-11897'

 

Please edit /etc/abrt/abrt-action-save-package-data.conf change the value for OpenGPGCheck should be changed from yes to no.

OpenGPGCheck = no

It might also be necessary to change the value of limit coredumpsize:

limit coredumpsize unlimited

After editing the file restart the process with the following command:

# service abrtd restart

Restart the node manager and share your joy !

avatar

@Shelton As checked, the /tmp does not have noexec enabled. Please provide an alternate solution for this.

/dev/mapper/rootvg-tmp on /tmp type xfs (rw,relatime,attr2,inode64,noquota)

avatar

@Shelton Any update on this? looks like it is looking for some java packages 

java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-4657625312215122883.8 (Permission denied)]

can we install it externally?