Created on
12-19-2019
05:16 AM
- last edited on
12-19-2019
05:23 AM
by
VidyaSargur
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 102, in <module>
Nodemanager().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 351, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 53, in start
service('nodemanager',action='start')
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/service.py", line 93, in service
Execute(daemon_cmd, user = usr, not_if = check_process)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.1.0-187/hadoop/libexec && /usr/hdp/3.0.1.0-187/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.1.0-187/hadoop/conf --daemon start nodemanager' returned 1. -bash: line 0: ulimit: core file size: cannot modify limit: Operation not permitted
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1847: /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid: Permission denied
ERROR: Cannot write nodemanager pid /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid.
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1866: /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager
Created 12-19-2019 05:17 AM
@jsensharma Please look into this issue
Created 12-19-2019 06:26 AM
Please check if this file exists /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid . If not create the directory
mkdir /var/run/hadoop-yarn/yarn/
chown -R yarn:hadoop /var/run/hadoop-yarn/yarn/
touch hadoop-yarn-nodemanager.pid
chown yarn:hadoop /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid
This will work.
Created 12-19-2019 01:55 PM
I think there is a permission issue with the pid file
Can you check the permissions, if for any reason the are not as shown in the screenshot please run the chown as root to rectify that
# chown yarn:hadoop /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid
Do that for all files in the directory whose permissions are not correct.
HTH
Created 12-20-2019 02:07 AM
@Shelton I tried the below solution even though the pid file created with 444 permission upon multiple restarts.
-r--r--r-- 1 yarn hadoop 6 Dec 20 05:00 hadoop-yarn-nodemanager.pid
Still the above issue is persisting
resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.1.0-187/hadoop/libexec && /usr/hdp/3.0.1.0-187/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.1.0-187/hadoop/conf --daemon start nodemanager' returned 1. -bash: line 0: ulimit: core file size: cannot modify limit: Operation not permitted
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1847: /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid: Permission denied
ERROR: Cannot write nodemanager pid /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid.
/usr/hdp/3.0.1.0-187/hadoop/libexec/hadoop-functions.sh: line 1866: /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-Hostname.org.out: Permission denied
Created 12-20-2019 03:13 AM
The file permission should be 644 not 444
# chmod 644 /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid
Do that and revert please
Created on 12-20-2019 03:53 AM - edited 12-20-2019 03:54 AM
@Shelton I have changed it to 644 but however after starting node manager it remains the same 444.
Before:
-rw-r--r-- 1 yarn hadoop 6 Dec 20 05:00 hadoop-yarn-nodemanager.pid
After
-r--r--r-- 1 yarn hadoop 6 Dec 20 05:00 hadoop-yarn-nodemanager.pid
Not able to find the exact cause why it is changing again to 444 though i did the permission manually.
Created 12-23-2019 02:12 AM
@Shelton Any update on the above
Created 12-24-2019 01:06 AM
I have tried to analyze your situation but with access to the Linux box it rather difficult,but I think there is a workaround.
The chattr linux command makes important files IMMUTABLE (Unchangeable).
The immutable bit [ +i ] can only be set by superuser (i.e root) user or a user with sudo privileges can be able to set. This will prevent the file from being deleted forcefully, renamed or change the permissions, but it won’t be allowed says 'Operation not permitted“'
# ls -al /var/run/hadoop-yarn/yarn/
total 8
.
..
-rw-r--r-- 1 yarn hadoop 0 Dec 24 09:34 hadoop-yarn-nodemanager.pid
Set immutable bit
# chattr +i hadoop-yarn-nodemanager.pid
Verify the attribute with command the below command
# lsattr
----i--------e-- ./hadoop-yarn-nodemanager.pid
The normal ls command shows no difference
# ls -al /var/run/hadoop-yarn/yarn/
total 8
drwxr-xr-x 2 root root 4096 Dec 24 09:34 .
drwxr-xr-x 3 root root 4096 Dec 24 09:34 ..
-rw-r--r-- 1 yarn hadoop 0 Dec 24 09:34 hadoop-yarn-nodemanager.pid
Deletion protection
# rm -rf /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid
rm: cannot remove ‘/var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid’: Operation not permitted
Permission change protected
# chmod 755 /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid
chmod: changing permissions of ‘/var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid’: Operation not permitted
How to unset attribute on Files
# chattr -i /var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid
After resetting permissions, verify the immutable status of files using lsattr command
# lsattr
---------------- ./var/run/hadoop-yarn/yarn/hadoop-yarn-nodemanager.pid
Please do that and revert
Created 12-24-2019 03:15 AM
I have tried to set the attribute for the file hadoop-yarn-nodemanager.pid
however, the file system /var/run seems to be XFS file system. The chattr commad does not work with xfs FS as per redhat. Please provide an alternate solution for this issue.
[root@w0lxdhdp05 yarn]# lsattr
lsattr: Inappropriate ioctl for device While reading flags on ./hadoop-yarn-nodemanager.pid
chattr: Inappropriate ioctl for device while reading flags on hadoop-yarn-nodemanager.pid
Please refer this -> https://access.redhat.com/solutions/184693
Created 12-27-2019 03:06 AM
@Shelton any update on this
Created 12-28-2019 02:13 AM
Sorry festive period, can you do the following.
Delete old messages in /var/log/messages all that have the extension /var/log/messages.x that should leave you with only one /var/log/messages then truncate that file so you will have only new entries
# truncate --size 0 /var/log/messages
Do the same for /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log.x and also truncate the /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log
# truncate --size 0 /var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-<node_name>.log
Start manually the node manager
# su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"
Then share the latest files created below
Please revert
Created 01-02-2020 02:24 AM
Created 01-02-2020 03:13 AM
I see in the hadoop-yarn-nodemanager-w0lxdhdp05.ifc.org.log errors pointing to "Unable to start NodeManager: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6279667856305652637.8 (Permission denied)]
My suspicion:
Please verify that /tmp on the host does not have the noexec option set. You can verify this by running /bin/mount and checking the mount options. If you are able to, remount /tmp without noexec and try starting the NodeManager again. I am sure its issue with noexec on /tmp.
See my sample output
[root@tokyo ~]# /bin/mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=7167976k,nr_inodes=1791994,mode=755)
.......
...
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15609)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
/dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
/dev/sda5 on /opt type ext4 (rw,relatime,data=ordered)
/dev/sda8 on /home type ext4 (rw,relatime,data=ordered)
/dev/sda11 on /u02 type ext4 (rw,relatime,data=ordered)
/dev/sda6 on /var type ext4 (rw,relatime,data=ordered)
/dev/sda10 on /u01 type ext4 (rw,relatime,data=ordered)
/dev/sda9 on /tmp type ext4 (rw,relatime,data=ordered)
This issue occurs when the user running the Hadoop [Nodemanager start] process does not have the necessary rights and cannot generate temporary files under the /tmp directory.
Solution
- Allow the user running node manager startup process read/write/execute access on /tmp
- Remove the noexec parameter when mounting /tmp
- Change the execution rights on /tmp. ie: sudo chmod 777 /tmp
In the /var/log/messages I can also see
Jan 2 05:14:23 w0lxdhdp05 abrt-server: Package 'ambari-agent' isn't signed with proper key
Jan 2 05:14:23 w0lxdhdp05 abrt-server: 'post-create' on '/var/spool/abrt/Python-2020-01-02-05:14:22-11897' exited with 1
Jan 2 05:14:23 w0lxdhdp05 abrt-server: Deleting problem directory '/var/spool/abrt/Python-2020-01-02-05:14:22-11897'
Please edit /etc/abrt/abrt-action-save-package-data.conf change the value for OpenGPGCheck should be changed from yes to no.
OpenGPGCheck = no
It might also be necessary to change the value of limit coredumpsize:
limit coredumpsize unlimited
After editing the file restart the process with the following command:
# service abrtd restart
Restart the node manager and share your joy !
Created 01-02-2020 06:06 AM
@Shelton As checked, the /tmp does not have noexec enabled. Please provide an alternate solution for this.
/dev/mapper/rootvg-tmp on /tmp type xfs (rw,relatime,attr2,inode64,noquota)
Created 01-06-2020 02:41 AM
@Shelton Any update on this? looks like it is looking for some java packages
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-4657625312215122883.8 (Permission denied)]
can we install it externally?