Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to start the node manager

avatar

Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 102, in <module>
Nodemanager().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 351, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 53, in start
service('nodemanager',action='start')
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/service.py", line 93, in service
Execute(daemon_cmd, user = usr, not_if = check_process)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.1.0-187/hadoop/libexec && /usr/hdp/3.0.1.0-187/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.1.0-187/hadoop/conf --daemon start nodemanager' returned 1. -bash: line 0: ulimit: core file size: cannot modify limit: Operation not permitted
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1847: /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid: Permission denied
ERROR: Cannot write nodemanager pid /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid.
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1866: /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager

15 REPLIES 15

avatar

@Shelton any update on this 

avatar
Master Mentor

@saivenkatg55 

Sorry festive period, can you do the following.

Delete old messages in /var/log/messages all that have the extension /var/log/messages.x that should leave you with only one /var/log/messages then truncate that file so you will have only new entries

# truncate --size 0 /var/log/messages

Do the same for /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log.x and also truncate the /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log

# truncate --size 0 /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log


Start manually the node manager

# su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"


Then share the latest files created below

  • /var/log/messages
  • /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log
  • /var/lib/ambari-agent/data/errors-xxx.txt

Please revert 

avatar
  • @Shelton Please check your mail. Kindly check and update me 

avatar
Master Mentor

@saivenkatg55 

 

I see in the hadoop-yarn-nodemanager-w0lxdhdp05.ifc.org.log errors pointing to "Unable to start NodeManager: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6279667856305652637.8 (Permission denied)]

 

My suspicion:

 

Please verify that /tmp on the host does not have the noexec option set. You can verify this by running /bin/mount and checking the mount options. If you are able to, remount /tmp without noexec and try starting the NodeManager again. I am sure its issue with noexec on /tmp.

See my sample output

[root@tokyo ~]# /bin/mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=7167976k,nr_inodes=1791994,mode=755)
.......
...
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15609)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
/dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
/dev/sda5 on /opt type ext4 (rw,relatime,data=ordered)
/dev/sda8 on /home type ext4 (rw,relatime,data=ordered)
/dev/sda11 on /u02 type ext4 (rw,relatime,data=ordered)
/dev/sda6 on /var type ext4 (rw,relatime,data=ordered)
/dev/sda10 on /u01 type ext4 (rw,relatime,data=ordered)
/dev/sda9 on /tmp type ext4 (rw,relatime,data=ordered)


This issue occurs when the user running the Hadoop [Nodemanager start] process does not have the necessary rights and cannot generate temporary files under the /tmp directory.

 

Solution

- Allow the user running node manager startup process read/write/execute access on /tmp
- Remove the noexec parameter when mounting /tmp
- Change the execution rights on /tmp. ie: sudo chmod 777 /tmp

 

In the /var/log/messages I  can also see
Jan 2 05:14:23 w0lxdhdp05 abrt-server: Package 'ambari-agent' isn't signed with proper key
Jan 2 05:14:23 w0lxdhdp05 abrt-server: 'post-create' on '/var/spool/abrt/Python-2020-01-02-05:14:22-11897' exited with 1
Jan 2 05:14:23 w0lxdhdp05 abrt-server: Deleting problem directory '/var/spool/abrt/Python-2020-01-02-05:14:22-11897'

 

Please edit /etc/abrt/abrt-action-save-package-data.conf change the value for OpenGPGCheck should be changed from yes to no.

OpenGPGCheck = no

It might also be necessary to change the value of limit coredumpsize:

limit coredumpsize unlimited

After editing the file restart the process with the following command:

# service abrtd restart

Restart the node manager and share your joy !

avatar

@Shelton As checked, the /tmp does not have noexec enabled. Please provide an alternate solution for this.

/dev/mapper/rootvg-tmp on /tmp type xfs (rw,relatime,attr2,inode64,noquota)

avatar

@Shelton Any update on this? looks like it is looking for some java packages 

java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-4657625312215122883.8 (Permission denied)]

can we install it externally?