We are using HDP 184.108.40.206 on RHEL7. We had the cluster up and running for about 6 months now. We had to reboot the cluster couple of days back and noticed that the service status did not get correctly reflected on Ambari. If I do ps -ef on the box, I see that the process is running. Upon digging further I find that the $USER variable in the scripts is not getting resolved. As a result the pid files are created in the wrong directory with incorrect name. e.g /var/run/hadoop-mapreduce/mapred--historyserver.pid instead of /var/run/hadoop-mapreduce/mapred/mapred-mapred-history-server.pid. Any pointers in troubleshooting this? TIA
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su mapred -l -s /bin/bash -c 'ls /var/run/hadoop-mapreduce/mapred/mapred-mapred-historyserver.pid && ps -p `cat /var/run/hadoop-mapreduce/mapred/mapred-mapred-historyserver.pid`'' returned 2. ls: cannot access /var/run/hadoop-mapreduce/mapred/mapred-mapred-historyserver.pid: No such file or directory
In order to fix this issue, stop the mapred-history server and make sure /var/run/hadoop-mapreduce/mapred--historyserver.pid and /var/run/hadoop-mapreduce/mapred/mapred-mapred-history-server.pid is deleted. Then start mapred history server through ambari.
@Vijay Lakshman, looks like the USER env variable got missing from the machines. Can you please check on your hosts whether $USER is set for all the users ( such as hdfs, yarn, mapred etc ). You can also use "printenv" to print all env var.
[root@xxx]# sudo su hdfs
bash-4.2$ echo $USER
It helped us troubleshoot the issue but we could not figure out why this is happening. We went back to reinstalling a new cluster on RHEL 7.2 and that is working fine. We still need to figure why RHEL 7.3 is causing this issue. Will post an update if we ever figure out.