Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Services are reported as down from Ambari and will not stay online, services still running on host

Services are reported as down from Ambari and will not stay online, services still running on host

New Contributor

Services are reported as down from Ambari and will not stay online, services still running on hostthe service hdfs in ambari has started ,but in the ui ,it shows stopped.

in the ambari-agent.log,I get this useful info:

WARNING 2018-04-25 21:59:52,863 CommandStatusDict.py:128 - [Errno 2] No such file or directory: '/var/lib/ambari-agent/data/output-2612.txt'

and I has tried this link Services are reported as down from Ambari and will not stay online, services still running on host

but still can't solve it.

beg for helps!

5 REPLIES 5

Re: Services are reported as down from Ambari and will not stay online, services still running on host

Super Mentor

@shi yu

- Are you running the ambari agents as Non Root User? Also can you check if that non root user is able to create a file inside the mentioned directory on that host?

Example:

# su - $NON_ROOT_AGENT_USER
# echo "hello" > /var/lib/ambari-agent/data/output-2612.txt

- Can you please share the complete "/var/log/ambari-agent/ambari-agent.log" file after agent restart? (on the host where the service component is failing to start)

# ambari-agent restart

Then share the log of agent:
/var/log/ambari-agent/ambari-agent.log"

.

Also please share the agent data directory permission:

# ls -ld /var/lib/ambari-agent/data

If that directory does not exist then please create one.

.

Re: Services are reported as down from Ambari and will not stay online, services still running on host

New Contributor

1.Yes,I start ambri by this command "sudo service ambari-agent restart" in no root user.

2.just now I restart ambari-agent in root user ,and the log message is in the attached file .

3.And about the permission of '/var/lib/ambari-agent/data',I got this:

drwxr-xr-x.3 root root 36864April2616:15/var/lib/ambari-agent/data

Re: Services are reported as down from Ambari and will not stay online, services still running on host

Super Mentor

@shi yu

In your ambari-agent log file i see no Error / warning. In order to run ambari agent as non root user we need to following the steps mentioned in the below docs:

- https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.1.5/bk_ambari-security/content/commands_agent.h...

- https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.1.5/bk_ambari-security/content/how_to_configure...

So can you please check and share the complete operational log which you see in the ambari UI.

Also can you please share the ambari-server.log?


Re: Services are reported as down from Ambari and will not stay online, services still running on host

New Contributor

@Jay Kumar SenSharma

my error situation is like mentioned this link:Ambari show namenode is stop but actually namenode is still working

I start the namenode by ambari web ui,it shows topped after only a while being started

the useful ambari-agent log is below:

INFO 2018-04-26 18:52:14,726 Heartbeat.py:78 - Building Heartbeat: {responseId = 764, timestamp = 1524739934726, commandsInProgress = False, componentsMapped = True}
INFO 2018-04-26 18:52:14,809 Controller.py:254 - Heartbeat response received (id = 765)
WARNING 2018-04-26 18:52:19,230 base_alert.py:395 - [Alert][datanode_health_summary] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2018-04-26 18:52:19,234 base_alert.py:395 - [Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2018-04-26 18:52:19,240 base_alert.py:395 - [Alert][namenode_webui] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2018-04-26 18:52:19,256 base_alert.py:395 - [Alert][yarn_resourcemanager_webui] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}}
INFO 2018-04-26 18:52:24,810 Heartbeat.py:78 - Building Heartbeat: {responseId = 765, timestamp = 1524739944810, commandsInProgress = False, componentsMapped = True}
INFO 2018-04-26 18:52:24,817 Controller.py:254 - Heartbeat response received (id = 766)
INFO 2018-04-26 18:52:34,818 Heartbeat.py:78 - Building Heartbeat: {responseId = 766, timestamp = 1524739954818, commandsInProgress = False, componentsMapped = True}
INFO 2018-04-26 18:52:34,823 Controller.py:254 - Heartbeat response received (id = 767)
INFO 2018-04-26 18:52:44,824 Heartbeat.py:78 - Building Heartbeat: {responseId = 767, timestamp = 1524739964824, commandsInProgress = False, componentsMapped = True}
INFO 2018-04-26 18:52:44,902 Controller.py:254 - Heartbeat response received (id = 768)
INFO 2018-04-26 18:52:44,903 ClusterConfiguration.py:123 - Updating cached configurations for cluster dml_vmc
INFO 2018-04-26 18:52:44,953 ActionQueue.py:112 - Adding EXECUTION_COMMAND for role NAMENODE for service HDFS of cluster dml_vmc to the queue.
INFO 2018-04-26 18:52:44,977 ActionQueue.py:232 - Executing command with id = 382-0 for role = NAMENODE of cluster dml_vmc.
WARNING 2018-04-26 18:52:45,001 CommandStatusDict.py:128 - [Errno 2] No such file or directory: '/var/lib/ambari-agent/data/output-2652.txt'
INFO 2018-04-26 18:52:45,001 Heartbeat.py:78 - Building Heartbeat: {responseId = 768, timestamp = 1524739964999, commandsInProgress = True, componentsMapped = True}
INFO 2018-04-26 18:52:45,341 Controller.py:254 - Heartbeat response received (id = 769)
INFO 2018-04-26 18:52:50,317 Heartbeat.py:78 - Building Heartbeat: {responseId = 769, timestamp = 1524739970317, commandsInProgress = True, componentsMapped = True}
INFO 2018-04-26 18:52:50,422 Controller.py:254 - Heartbeat response received (id = 770)
INFO 2018-04-26 18:53:00,422 Heartbeat.py:78 - Building Heartbeat: {responseId = 770, timestamp = 1524739980422, commandsInProgress = False, componentsMapped = True}
INFO 2018-04-26 18:53:00,426 Controller.py:254 - Heartbeat response received (id = 771)
INFO 2018-04-26 18:53:10,427 Heartbeat.py:78 - Building Heartbeat: {responseId = 771, timestamp = 1524739990427, commandsInProgress = False, componentsMapped = True}
INFO 2018-04-26 18:53:10,641 Controller.py:254 - Heartbeat response received (id = 772<br>

the may useful ambari-server log is below:

26 Apr 2018 18:52:44,642  INFO [qtp-client-606] AbstractResourceProvider:875 - Received a updateHostComponent request, clusterName=dml_vmc, serviceName=HDFS, componentName=NAMENODE, hostname=host68.dml.com, request={ clusterName=dml_vmc, serviceName=HDFS, componentName=NAMENODE, hostname=host68.dml.com, desiredState=STARTED, state=null, desiredStackId=null, staleConfig=null, adminState=null}
26 Apr 2018 18:52:44,802  INFO [ambari-action-scheduler] ServiceComponentHostImpl:923 - Host role transitioned to a new state, serviceComponentName=NAMENODE, hostName=host68.dml.com, oldState=INSTALLED, currentState=STARTING
26 Apr 2018 18:52:44,806  INFO [qtp-client-592] PersistKeyValueService:82 - Looking for keyName admin-settings-show-bg-admin
26 Apr 2018 18:52:50,383  INFO [qtp-ambari-agent-636] HeartBeatHandler:567 - Updating applied config on service HDFS, component NAMENODE, host host68.dml.com
26 Apr 2018 18:52:50,392  INFO [qtp-ambari-agent-636] ServiceComponentHostImpl:923 - Host role transitioned to a new state, serviceComponentName=NAMENODE, hostName=host68.dml.com, oldState=STARTING, currentState=STARTED
26 Apr 2018 18:53:20,746  INFO [qtp-ambari-agent-636] HeartBeatHandler:657 - State of service component NAMENODE of service HDFS of cluster dml_vmc has changed from STARTED to INSTALLED at host host68.dml.com
26 Apr 2018 18:58:50,119  INFO [qtp-client-676] PersistKeyValueService:82 - Looking for keyName time-range-service-HDFS<br>

the whole log file is below:

ambari-agentlog.txt

ambari-serverlog.txt

Re: Services are reported as down from Ambari and will not stay online, services still running on host

Super Mentor

@shi yu

If you see that the NameNode is started in the ambari UI but got stopped immediately then you should check the NameNode log to see what is the issue.

We will find some useful information in the namenode log which will help us in understanding why the NameNode went down. Sometimes it may be a Memory/resource issue.

NameNode logs can be found here:

# ls -l /var/log/hadoop/hdfs/hadoop-hdfs-namenode-*.log
# ls -l /var/log/hadoop/hdfs/hadoop-hdfs-namenode-*.out

.

Don't have an account?
Coming from Hortonworks? Activate your account here