Created 02-13-2017 06:06 PM
Hi,
We are using HDP 2.5.0.0 on RHEL7. We had the cluster up and running for about 6 months now. We had to reboot the cluster couple of days back and noticed that the service status did not get correctly reflected on Ambari. If I do ps -ef on the box, I see that the process is running. Upon digging further I find that the $USER variable in the scripts is not getting resolved. As a result the pid files are created in the wrong directory with incorrect name. e.g /var/run/hadoop-mapreduce/mapred--historyserver.pid instead of /var/run/hadoop-mapreduce/mapred/mapred-mapred-history-server.pid. Any pointers in troubleshooting this? TIA
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su mapred -l -s /bin/bash -c 'ls /var/run/hadoop-mapreduce/mapred/mapred-mapred-historyserver.pid && ps -p `cat /var/run/hadoop-mapreduce/mapred/mapred-mapred-historyserver.pid`'' returned 2. ls: cannot access /var/run/hadoop-mapreduce/mapred/mapred-mapred-historyserver.pid: No such file or directory
Created 02-15-2017 04:34 PM
Thanks that's helpful. We upgraded to RHEL7.3. Looks like this is not supported by HDP 2.5.3 yet.
Created 02-14-2017 12:14 AM
@Vijay Lakshman, how did you restart the mapred history server after machine reboot ? Did you start it through ambari ?
Can you also check the value of HADOOP_MAPRED_PID_DIR var in hadoop-env.sh ? ideally it should be as below.
export HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/$USER
In order to fix this issue, stop the mapred-history server and make sure /var/run/hadoop-mapreduce/mapred--historyserver.pid and /var/run/hadoop-mapreduce/mapred/mapred-mapred-history-server.pid is deleted. Then start mapred history server through ambari.
Created 02-14-2017 12:25 AM
Thanks. I am starting all the services by clicking on Start All in Ambari. I am facing the same issue with all the services.
Yes, the environment variable is using the default and not overriden.
e.g for timeline server
drwxr-xr-x 2 yarn hadoop 40 Feb 13 18:45 yarn
-rw-r--r-- 1 yarn hadoop 6 Feb 13 18:48 yarn--resourcemanager.pid
-rw-r--r-- 1 yarn hadoop 5 Feb 13 18:47 yarn--timelineserver.pid
$ ls yarn
$
Created 02-15-2017 12:41 AM
@Vijay Lakshman, looks like the USER env variable got missing from the machines. Can you please check on your hosts whether $USER is set for all the users ( such as hdfs, yarn, mapred etc ). You can also use "printenv" to print all env var.
[root@xxx]# sudo su hdfs bash-4.2$ echo $USER hdfs
Created 02-15-2017 04:34 PM
Thanks that's helpful. We upgraded to RHEL7.3. Looks like this is not supported by HDP 2.5.3 yet.
Created 02-15-2017 09:00 PM
Glad to know I was able to help.
Created 02-16-2017 06:31 PM
It helped us troubleshoot the issue but we could not figure out why this is happening. We went back to reinstalling a new cluster on RHEL 7.2 and that is working fine. We still need to figure why RHEL 7.3 is causing this issue. Will post an update if we ever figure out.