I kept failing to start MapReduce2 service using Ambari UI. And I didn't find any log under /var/log/hadoop-mapreduce/mapred. I only find one error in /var/log/ambari-agent/ambari-agent.log saying:
ERROR 2018-10-29 12:36:39,444 HostInfo.py:248 - Checking java processes failed Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/HostInfo.py", line 245, in javaProcs metrics['user'] = pwd.getpwuid(uid).pw_name KeyError: 'getpwuid(): uid not found: 1005'
But I do not think it is relevant. Cannot find any other clue to solve this problem. Also didn't find any solution online. Any help?
Looks like the mapred user got deleted somehow. Can you try running the below commands on the node where HistoryServer is running and also on other nodes to confirm if the user is deleted from all nodes.
# id mapred # id 1005
If the user is not found, you can try adding the user mapred. Kill the HistoryServer process if running from backend and try starting it again
Thanks for help. I found one weird thing that I do not understand. When I try the command you suggested on the node where HistoryServer is supposed to be running before I start the service. It says "no such user" for both "id mapred" and "id 1005". However, after I start the MapReduce2 service through Ambari Server UI, the situation changed.
It seems that the starting process has created the user "mapred" and occupied the uid 1005 as a user named "yarn". Then the starting process of MapReduce2 stuck as what I have seen before.