Support Questions
Find answers, ask questions, and share your expertise

Unable to start the node manager

Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 102, in <module>
Nodemanager().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 351, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 53, in start
service('nodemanager',action='start')
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/service.py", line 93, in service
Execute(daemon_cmd, user = usr, not_if = check_process)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.1.0-187/hadoop/libexec && /usr/hdp/3.0.1.0-187/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.1.0-187/hadoop/conf --daemon start nodemanager' returned 1. -bash: line 0: ulimit: core file size: cannot modify limit: Operation not permitted
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1847: /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid: Permission denied
ERROR: Cannot write nodemanager pid /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid.
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1866: /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager

15 REPLIES 15

@jsensharma  Please look into this issue 

Expert Contributor

Please check if this file exists /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid . If not create the directory

 

mkdir /var/run/hadoop-yarn/yarn/

chown -R yarn:hadoop /var/run/hadoop-yarn/yarn/

touch hadoop-yarn-nodemanager.pid

chown yarn:hadoop /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid

 

This will work.

 

 

Mentor

@saivenkatg55 

 

I think there is a permission issue with the pid file 

nodemanager.PNG

Can you check the permissions, if for any reason the are not as shown in the screenshot please run the chown as root to rectify that
# chown yarn:hadoop /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid

Do that for all files in the directory whose permissions are not correct.

HTH

 

 

@Shelton I tried the below solution even though the pid file created with 444 permission upon multiple restarts.

 

-r--r--r-- 1 yarn hadoop 6 Dec 20 05:00 hadoop-yarn-nodemanager.pid

 

Still the above issue is persisting 

 

resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.1.0-187/hadoop/libexec && /usr/hdp/3.0.1.0-187/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.1.0-187/hadoop/conf --daemon start nodemanager' returned 1. -bash: line 0: ulimit: core file size: cannot modify limit: Operation not permitted
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1847: /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid: Permission denied
ERROR: Cannot write nodemanager pid /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid.
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1866: /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-Hostname.org.out: Permission denied

Mentor

@saivenkatg55 

 

The file permission should be 644 not 444 

 

# chmod 644 /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid

 

Do that and revert please

@Shelton  I have changed it to 644 but however after starting node manager it remains the same 444.

 

Before:

-rw-r--r-- 1 yarn hadoop 6 Dec 20 05:00 hadoop-yarn-nodemanager.pid

After

-r--r--r-- 1 yarn hadoop 6 Dec 20 05:00 hadoop-yarn-nodemanager.pid

 

Not able to find the exact cause why it is changing again to 444 though i did the permission manually.

@Shelton Any update on the above 

Mentor

@saivenkatg55 

I have tried to analyze your situation but with access to the Linux box it rather difficult,but I think there is a workaround.

 

The chattr linux command makes important files IMMUTABLE (Unchangeable).

 

The immutable bit [ +i  ] can only be set by superuser (i.e root) user or a user with sudo privileges can be able to set. This will prevent the file from being deleted forcefully, renamed or change the permissions, but it won’t be allowed says 'Operation not permitted“'

# ls -al /var/run/hadoop-yarn/yarn/
total 8
.
..
-rw-r--r-- 1 yarn hadoop 0 Dec 24 09:34 hadoop-yarn-nodemanager.pid

 

Set immutable bit

# chattr +i hadoop-yarn-nodemanager.pid

 

Verify the attribute with command the below command

# lsattr
----i--------e-- ./hadoop-yarn-nodemanager.pid

 

The normal ls command shows no difference

# ls -al /var/run/hadoop-yarn/yarn/
total 8
drwxr-xr-x 2 root root 4096 Dec 24 09:34 .
drwxr-xr-x 3 root root 4096 Dec 24 09:34 ..
-rw-r--r-- 1 yarn hadoop 0 Dec 24 09:34 hadoop-yarn-nodemanager.pid

 

Deletion protection

# rm -rf /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid
rm: cannot remove ‘/var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid’: Operation not permitted

 

Permission change  protected
# chmod 755  /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid
chmod: changing permissions of ‘/var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid’: Operation not permitted



How to unset attribute on Files

# chattr -i /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid

After resetting permissions, verify the immutable status of files using lsattr command

# lsattr

---------------- ./var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid

 

Please do that and revert 

 

 

@Shelton 

I have tried to set the attribute for the file hadoop-yarn-nodemanager.pid 

however, the file system /var/run seems to be XFS file system. The chattr commad does not work with xfs FS as per redhat. Please provide an alternate solution for this issue.

 

[root@w0lxdhdp05 yarn]# lsattr
lsattr: Inappropriate ioctl for device While reading flags on ./hadoop-yarn-nodemanager.pid

chattr: Inappropriate ioctl for device while reading flags on hadoop-yarn-nodemanager.pid

Please refer this -> https://access.redhat.com/solutions/184693

@Shelton any update on this 

Mentor

@saivenkatg55 

Sorry festive period, can you do the following.

Delete old messages in /var/log/messages all that have the extension /var/log/messages.x that should leave you with only one /var/log/messages then truncate that file so you will have only new entries

# truncate --size 0 /var/log/messages

Do the same for /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log.x and also truncate the /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log

# truncate --size 0 /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log


Start manually the node manager

# su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"


Then share the latest files created below

  • /var/log/messages
  • /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log
  • /var/lib/ambari-agent/data/errors-xxx.txt

Please revert 

  • @Shelton Please check your mail. Kindly check and update me 

Mentor

@saivenkatg55 

 

I see in the hadoop-yarn-nodemanager-w0lxdhdp05.ifc.org.log errors pointing to "Unable to start NodeManager: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6279667856305652637.8 (Permission denied)]

 

My suspicion:

 

Please verify that /tmp on the host does not have the noexec option set. You can verify this by running /bin/mount and checking the mount options. If you are able to, remount /tmp without noexec and try starting the NodeManager again. I am sure its issue with noexec on /tmp.

See my sample output

[root@tokyo ~]# /bin/mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=7167976k,nr_inodes=1791994,mode=755)
.......
...
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15609)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
/dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
/dev/sda5 on /opt type ext4 (rw,relatime,data=ordered)
/dev/sda8 on /home type ext4 (rw,relatime,data=ordered)
/dev/sda11 on /u02 type ext4 (rw,relatime,data=ordered)
/dev/sda6 on /var type ext4 (rw,relatime,data=ordered)
/dev/sda10 on /u01 type ext4 (rw,relatime,data=ordered)
/dev/sda9 on /tmp type ext4 (rw,relatime,data=ordered)


This issue occurs when the user running the Hadoop [Nodemanager start] process does not have the necessary rights and cannot generate temporary files under the /tmp directory.

 

Solution

- Allow the user running node manager startup process read/write/execute access on /tmp
- Remove the noexec parameter when mounting /tmp
- Change the execution rights on /tmp. ie: sudo chmod 777 /tmp

 

In the /var/log/messages I  can also see
Jan 2 05:14:23 w0lxdhdp05 abrt-server: Package 'ambari-agent' isn't signed with proper key
Jan 2 05:14:23 w0lxdhdp05 abrt-server: 'post-create' on '/var/spool/abrt/Python-2020-01-02-05:14:22-11897' exited with 1
Jan 2 05:14:23 w0lxdhdp05 abrt-server: Deleting problem directory '/var/spool/abrt/Python-2020-01-02-05:14:22-11897'

 

Please edit /etc/abrt/abrt-action-save-package-data.conf change the value for OpenGPGCheck should be changed from yes to no.

OpenGPGCheck = no

It might also be necessary to change the value of limit coredumpsize:

limit coredumpsize unlimited

After editing the file restart the process with the following command:

# service abrtd restart

Restart the node manager and share your joy !

@Shelton As checked, the /tmp does not have noexec enabled. Please provide an alternate solution for this.

/dev/mapper/rootvg-tmp on /tmp type xfs (rw,relatime,attr2,inode64,noquota)

@Shelton Any update on this? looks like it is looking for some java packages 

java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-4657625312215122883.8 (Permission denied)]

can we install it externally?

; ;