Support Questions

Find answers, ask questions, and share your expertise

Ambari error: All bootstrapped hosts registered but unable to retrieve cpu and memory related information

avatar
Contributor

Hello, using Ambari Server 2.5.0.3-7 and Agent 2.5.0.3 and I get this error when manually adding a new host to an Ambari Cluster.

The agent was preinstalled and preconfigured and the new host is running CentOS 7.4

No useful info (warning, error) in either agent or server logs.

1 ACCEPTED SOLUTION

avatar
Contributor

The only "strange" thing I see is in the ambari-audit.log, about my user (vide):

2017-10-25T12:23:50.729+0200, User(vide), RemoteIp(192.168.150.13), Operation(Request from server), RequestType(POST), url(http://ambari-data.billy.preprod/api/v1/requests), ResultStatus(403 Forbidden), Reason(The authenticated user is not authorized to execute the action check_host.), Command(null), Cluster name(null)

I should be in the cluster admin group..I'll try again with the local admin user...

View solution in original post

6 REPLIES 6

avatar
Master Mentor

@Davide Ferrari

Are you looking at the Ambari UI --> Hosts --> <specific Hostname> Page?

If you ahve Ambari Metrics Monitor Installed on the mentioned host then please make sure that the metrics monitor service is started and ambari metrics monitor is able to communicate with the Ambari Metrics Collector.

# ps -ef | grep resource_monitoring
ams      26400     1  0 Oct12 ?        02:02:45 /usr/bin/python2.7 /usr/lib/python2.6/site-packages/resource_monitoring/main.py start

.

Also please check the following log for any error:

# less /var/log/ambari-metrics-monitor/ambari-metrics-monitor.out 

.

avatar
Master Mentor

@Davide Ferrari

Can you please also share the following files:

1. /var/log/ambari-agent/ambari-agent.log (and ambari-agent.out)

2. The screenshot where you are not seeing the Memory/CPU details.

.

I am assuming that you are taking about the following Page Ambari UI -> Hosts --> Selected Host page.

39946-host-metrics.png

.

avatar
Contributor

@Jay SenSharma No, I'm trying to ADD the hosts to the cluster, they are currently not present. Screenshot:

39952-screenshot-from-2017-10-25-12-20-13.png

Ambari-metrics is installed but it hasn't run yet because the host is still not in any ambari cluster. ambari-agent.log (DEBUG enabled):

INFO 2017-10-25 15:25:42,947 DataCleaner.py:39 - Data cleanup thread started
INFO 2017-10-25 15:25:42,949 DataCleaner.py:120 - Data cleanup started
INFO 2017-10-25 15:25:42,949 DataCleaner.py:122 - Data cleanup finished
INFO 2017-10-25 15:25:42,971 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2017-10-25 15:25:42,972 main.py:181 - Newloglevel=logging.DEBUG
INFO 2017-10-25 15:25:42,972 main.py:436 - Connecting to Ambari server at https://ambari-data.billy.preprod:8440 (192.168.40.120)
DEBUG 2017-10-25 15:25:42,973 NetUtil.py:110 - Trying to connect to https://ambari-data.billy.preprod:8440
INFO 2017-10-25 15:25:42,973 NetUtil.py:67 - Connecting to https://ambari-data.billy.preprod:8440/ca
DEBUG 2017-10-25 15:25:43,064 NetUtil.py:87 - GET https://ambari-data.billy.preprod:8440/ca -> 200, body: 
INFO 2017-10-25 15:25:43,064 main.py:446 - Connected to Ambari server ambari-data.billy.preprod
DEBUG 2017-10-25 15:25:43,064 Controller.py:66 - Initializing Controller RPC thread.
INFO 2017-10-25 15:25:43,065 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2017-10-25 15:25:43,065 AlertSchedulerHandler.py:280 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2017-10-25 15:25:43,065 AlertSchedulerHandler.py:175 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x1779cd0>; currently running: False
DEBUG 2017-10-25 15:25:43,066 scheduler.py:574 - Scheduler started
DEBUG 2017-10-25 15:25:43,066 scheduler.py:579 - Looking for jobs to run
DEBUG 2017-10-25 15:25:43,066 scheduler.py:599 - No jobs; waiting until a job is added
INFO 2017-10-25 15:25:45,085 hostname.py:98 - Read public hostname 'druid-co01.billy.preprod' using socket.getfqdn()
INFO 2017-10-25 15:25:45,127 Hardware.py:174 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/0
DEBUG 2017-10-25 15:25:45,296 HostCheckReportFileHandler.py:126 - Host check report at /var/lib/ambari-agent/data/hostcheck.result
DEBUG 2017-10-25 15:25:45,297 HostCheckReportFileHandler.py:177 - Removing old host check file at /var/lib/ambari-agent/data/hostcheck.result
DEBUG 2017-10-25 15:25:45,297 HostCheckReportFileHandler.py:182 - Creating host check file at /var/lib/ambari-agent/data/hostcheck.result
INFO 2017-10-25 15:25:45,299 Controller.py:170 - Registering with druid-co01.billy.preprod (192.168.40.52) (agent='{"hardwareProfile": {"kernel": "Linux", "domain": "billy.preprod", "physicalprocessorcount": 4, "kernelrelease": "3.10.0-693.5.2.el7.x86_64", "uptime_days": "0", "memorytotal": 8002256, "swapfree": "0.00 GB", "memorysize": 8002256, "osfamily": "redhat", "swapsize": "0.00 GB", "processorcount": 4, "netmask": "255.255.255.0", "timezone": "CET", "hardwareisa": "x86_64", "memoryfree": 6255416, "operatingsystem": "centos", "kernelmajversion": "3.10", "kernelversion": "3.10.0", "macaddress": "00:1A:4A:16:01:AD", "operatingsystemrelease": "7.4.1708", "ipaddress": "192.168.40.52", "hostname": "druid-co01", "uptime_hours": "5", "fqdn": "druid-co01.billy.preprod", "id": "root", "architecture": "x86_64", "selinux": false, "mounts": [{"available": "5313004", "used": "3064340", "percent": "37%", "device": "/dev/vda1", "mountpoint": "/", "type": "xfs", "size": "8377344"}, {"available": "3978004", "used": "0", "percent": "0%", "device": "devtmpfs", "mountpoint": "/dev", "type": "devtmpfs", "size": "3978004"}], "hardwaremodel": "x86_64", "uptime_seconds": "18331", "interfaces": "eth0,lo"}, "currentPingPort": 8670, "prefix": "/var/lib/ambari-agent/data", "agentVersion": "2.5.0.3", "agentEnv": {"transparentHugePage": "", "hostHealth": {"agentTimeStampAtReporting": 1508937945297, "activeJavaProcs": [{"command": "/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server coordinator", "pid": 11784, "hadoop": true, "user": "root"}, {"command": "/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server overlord", "pid": 11886, "hadoop": true, "user": "root"}], "liveServices": [{"status": "Healthy", "name": "ntpd or chronyd", "desc": ""}]}, "reverseLookup": true, "alternatives": [], "umask": "18", "firewallName": "iptables", "stackFoldersAndFiles": [], "existingUsers": [], "firewallRunning": true}, "timestamp": 1508937945235, "hostname": "druid-co01.billy.preprod", "responseId": -1, "publicHostname": "druid-co01.billy.preprod"}')
INFO 2017-10-25 15:25:45,300 NetUtil.py:67 - Connecting to https://ambari-data.billy.preprod:8440/connection_info
DEBUG 2017-10-25 15:25:45,385 NetUtil.py:87 - GET https://ambari-data.billy.preprod:8440/connection_info -> 200, body: {"security.server.two_way_ssl":"false"}
DEBUG 2017-10-25 15:25:45,385 security.py:52 - Server two-way SSL authentication required: False
INFO 2017-10-25 15:25:45,385 security.py:93 - SSL Connect being called.. connecting to the server
INFO 2017-10-25 15:25:45,516 security.py:60 - SSL connection established. Two-way SSL authentication is turned off on the server.
DEBUG 2017-10-25 15:25:45,606 Controller.py:177 - Registration response is {u'agentConfig': {u'agent.auto.cache.update': u'true',
                  u'agent.check.mounts.timeout': u'0',
                  u'agent.check.remote.mounts': u'false'},
 u'exitstatus': 0,
 u'response': u'OK',
 u'responseId': 0,
 u'statusCommands': []}
INFO 2017-10-25 15:25:45,606 Controller.py:196 - Registration Successful (response id = 0)
INFO 2017-10-25 15:25:45,606 AmbariConfig.py:316 - Updating config property (agent.check.remote.mounts) with value (false)
INFO 2017-10-25 15:25:45,607 AmbariConfig.py:316 - Updating config property (agent.auto.cache.update) with value (true)
INFO 2017-10-25 15:25:45,607 AmbariConfig.py:316 - Updating config property (agent.check.mounts.timeout) with value (0)
DEBUG 2017-10-25 15:25:45,607 Controller.py:205 - Updated config:<AmbariConfig.AmbariConfig instance at 0x174bef0>
DEBUG 2017-10-25 15:25:45,607 Controller.py:212 - Got status commands on registration.
DEBUG 2017-10-25 15:25:45,607 Controller.py:256 - No status commands received from ambari-data.billy.preprod
WARNING 2017-10-25 15:25:45,607 AlertSchedulerHandler.py:123 - There are no alert definition commands in the heartbeat; unable to update definitions
INFO 2017-10-25 15:25:45,607 Controller.py:512 - Registration response from ambari-data.billy.preprod was OK
INFO 2017-10-25 15:25:45,607 Controller.py:517 - Resetting ActionQueue...
INFO 2017-10-25 15:25:55,619 Controller.py:304 - Heartbeat (response id = 0) with server is running...
INFO 2017-10-25 15:25:55,619 Controller.py:311 - Building heartbeat message
DEBUG 2017-10-25 15:25:55,620 Heartbeat.py:83 - Building Heartbeat: {responseId = 0, timestamp = 1508937955620, commandsInProgress = False, componentsMapped = False,recoveryTimestamp = -1}
DEBUG 2017-10-25 15:25:55,620 Heartbeat.py:86 - Heartbeat: {'componentStatus': [],
 'hostname': 'druid-co01.billy.preprod',
 'nodeStatus': {'cause': 'NONE', 'status': 'HEALTHY'},
 'recoveryReport': {'summary': 'DISABLED'},
 'recoveryTimestamp': -1,
 'reports': [],
 'responseId': 0,
 'timestamp': 1508937955620}
INFO 2017-10-25 15:25:55,621 Heartbeat.py:90 - Adding host info/state to heartbeat message.
DEBUG 2017-10-25 15:25:55,733 HostCheckReportFileHandler.py:126 - Host check report at /var/lib/ambari-agent/data/hostcheck.result
DEBUG 2017-10-25 15:25:55,734 HostCheckReportFileHandler.py:177 - Removing old host check file at /var/lib/ambari-agent/data/hostcheck.result
DEBUG 2017-10-25 15:25:55,734 HostCheckReportFileHandler.py:182 - Creating host check file at /var/lib/ambari-agent/data/hostcheck.result
INFO 2017-10-25 15:25:55,770 Hardware.py:174 - Some mount points were ignored: /, /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/0
DEBUG 2017-10-25 15:25:55,771 Heartbeat.py:100 - agentEnv: {'transparentHugePage': '', 'hostHealth': {'agentTimeStampAtReporting': 1508937955734, 'activeJavaProcs': [{'command': u'/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server overlord', 'pid': 12022, 'hadoop': True, 'user': 'root'}, {'command': u'/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server coordinator', 'pid': 12023, 'hadoop': True, 'user': 'root'}], 'liveServices': [{'status': 'Healthy', 'name': 'ntpd or chronyd', 'desc': ''}]}, 'reverseLookup': True, 'alternatives': [], 'umask': '18', 'firewallName': 'iptables', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': True}
DEBUG 2017-10-25 15:25:55,771 Heartbeat.py:101 - mounts: []
INFO 2017-10-25 15:25:55,771 Controller.py:318 - Sending Heartbeat (id = 0): {"alerts": [], "nodeStatus": {"status": "HEALTHY", "cause": "NONE"}, "timestamp": 1508937955620, "hostname": "druid-co01.billy.preprod", "responseId": 0, "reports": [], "mounts": [], "recoveryTimestamp": -1, "agentEnv": {"transparentHugePage": "", "hostHealth": {"agentTimeStampAtReporting": 1508937955734, "activeJavaProcs": [{"command": "/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server overlord", "pid": 12022, "hadoop": true, "user": "root"}, {"command": "/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server coordinator", "pid": 12023, "hadoop": true, "user": "root"}], "liveServices": [{"status": "Healthy", "name": "ntpd or chronyd", "desc": ""}]}, "reverseLookup": true, "alternatives": [], "umask": "18", "firewallName": "iptables", "stackFoldersAndFiles": [], "existingUsers": [], "firewallRunning": true}, "recoveryReport": {"summary": "DISABLED"}, "componentStatus": []}
INFO 2017-10-25 15:25:55,775 Controller.py:332 - Heartbeat response received (id = 1)
INFO 2017-10-25 15:25:55,775 Controller.py:341 - Heartbeat interval is 10 seconds
INFO 2017-10-25 15:25:55,775 Controller.py:377 - Updating configurations from heartbeat
INFO 2017-10-25 15:25:55,775 Controller.py:386 - Adding cancel/execution commands
DEBUG 2017-10-25 15:25:55,775 Controller.py:246 - No commands received from ambari-data.billy.preprod
DEBUG 2017-10-25 15:25:55,775 Controller.py:256 - No status commands received from ambari-data.billy.preprod
INFO 2017-10-25 15:25:55,775 Controller.py:403 - Adding recovery commands
DEBUG 2017-10-25 15:25:55,775 Controller.py:422 - No commands sent from ambari-data.billy.preprod
INFO 2017-10-25 15:25:55,775 Controller.py:471 - Waiting 9.9 for next heartbeat
INFO 2017-10-25 15:26:05,677 Controller.py:478 - Wait for next heartbeat over
DEBUG 2017-10-25 15:26:05,677 Controller.py:304 - Heartbeat (response id = 1) with server is running...
DEBUG 2017-10-25 15:26:05,677 Controller.py:311 - Building heartbeat message
DEBUG 2017-10-25 15:26:05,678 Heartbeat.py:83 - Building Heartbeat: {responseId = 1, timestamp = 1508937965678, commandsInProgress = False, componentsMapped = False,recoveryTimestamp = -1}
DEBUG 2017-10-25 15:26:05,679 Heartbeat.py:86 - Heartbeat: {'componentStatus': [],
 'hostname': 'druid-co01.billy.preprod',
 'nodeStatus': {'cause': 'NONE', 'status': 'HEALTHY'},
 'recoveryReport': {'summary': 'DISABLED'},
 'recoveryTimestamp': -1,
 'reports': [],
 'responseId': 1,
 'timestamp': 1508937965678}
DEBUG 2017-10-25 15:26:05,680 Controller.py:318 - Sending Heartbeat (id = 1): {"alerts": [], "nodeStatus": {"status": "HEALTHY", "cause": "NONE"}, "timestamp": 1508937965678, "hostname": "druid-co01.billy.preprod", "responseId": 1, "reports": [], "recoveryTimestamp": -1, "recoveryReport": {"summary": "DISABLED"}, "componentStatus": []}
DEBUG 2017-10-25 15:26:05,683 Controller.py:332 - Heartbeat response received (id = 2)
DEBUG 2017-10-25 15:26:05,683 Controller.py:341 - Heartbeat interval is 10 seconds
DEBUG 2017-10-25 15:26:05,683 Controller.py:377 - Updating configurations from heartbeat
DEBUG 2017-10-25 15:26:05,684 Controller.py:386 - Adding cancel/execution commands
DEBUG 2017-10-25 15:26:05,684 Controller.py:246 - No commands received from ambari-data.billy.preprod
DEBUG 2017-10-25 15:26:05,684 Controller.py:256 - No status commands received from ambari-data.billy.preprod
DEBUG 2017-10-25 15:26:05,684 Controller.py:403 - Adding recovery commands
DEBUG 2017-10-25 15:26:05,684 Controller.py:422 - No commands sent from ambari-data.billy.preprod
DEBUG 2017-10-25 15:26:05,684 Controller.py:471 - Waiting 9.9 for next heartbeat


And the last part of the server log:

25 Oct 2017 15:25:43,107  WARN [qtp-ambari-agent-39] SecurityFilter:103 - Request https://ambari-data.billy.preprod:8440/ca doesn't match any pattern.
25 Oct 2017 15:25:43,107  WARN [qtp-ambari-agent-39] SecurityFilter:62 - This request is not allowed on this port: https://ambari-data.billy.preprod:8440/ca
25 Oct 2017 15:25:45,570  INFO [qtp-ambari-agent-36] HeartBeatHandler:425 - agentOsType = centos7
25 Oct 2017 15:25:45,647  INFO [qtp-ambari-agent-36] HostImpl:329 - Received host registration, host=[hostname=druid-co01,fqdn=druid-co01.billy.preprod,domain=billy.preprod,architecture=x86_64,processorcount=4,physicalprocessorcount=4,osname=centos,osversion=7.4.1708,osfamily=redhat,memory=8002256,uptime_hours=5,mounts=(available=5313004,mountpoint=/,used=3064340,percent=37%,size=8377344,device=/dev/vda1,type=xfs)(available=3978004,mountpoint=/dev,used=0,percent=0%,size=3978004,device=devtmpfs,type=devtmpfs)]
, registrationTime=1508937945570, agentVersion=2.5.0.3
25 Oct 2017 15:25:45,647  INFO [qtp-ambari-agent-36] TopologyManager:548 - TopologyManager.onHostRegistered: Entering
25 Oct 2017 15:25:45,647  INFO [qtp-ambari-agent-36] TopologyManager:602 - Host druid-co01.billy.preprod re-registered, will not be added to the available hosts list

avatar
Contributor

The only "strange" thing I see is in the ambari-audit.log, about my user (vide):

2017-10-25T12:23:50.729+0200, User(vide), RemoteIp(192.168.150.13), Operation(Request from server), RequestType(POST), url(http://ambari-data.billy.preprod/api/v1/requests), ResultStatus(403 Forbidden), Reason(The authenticated user is not authorized to execute the action check_host.), Command(null), Cluster name(null)

I should be in the cluster admin group..I'll try again with the local admin user...

avatar
Contributor

Ok, solved, it was indeed a permission problem. But this should be considered a bug nonetheless. The message doesn't tell anything about the issue PLUS it starts an automated loop of error box -> logging out -> logging in -> error box with no options to jump out of it but waiting for the session to expire with the browser tab closed

avatar
Contributor

This might help to other people. This seems like a bug :

https://issues.apache.org/jira/browse/AMBARI-21687

The bug is fixed in 2.6.0.0 branch only ( not fixed in 2.4.x.x or 2.5.x.x versions of ambari). So, The available workaround is to add the node via 'Admin' user or use the ambari 2.6.x