Created 10-25-2017 10:42 AM
Hello, using Ambari Server 2.5.0.3-7 and Agent 2.5.0.3 and I get this error when manually adding a new host to an Ambari Cluster.
The agent was preinstalled and preconfigured and the new host is running CentOS 7.4
No useful info (warning, error) in either agent or server logs.
Created 10-25-2017 01:35 PM
The only "strange" thing I see is in the ambari-audit.log, about my user (vide):
2017-10-25T12:23:50.729+0200, User(vide), RemoteIp(192.168.150.13), Operation(Request from server), RequestType(POST), url(http://ambari-data.billy.preprod/api/v1/requests), ResultStatus(403 Forbidden), Reason(The authenticated user is not authorized to execute the action check_host.), Command(null), Cluster name(null)
I should be in the cluster admin group..I'll try again with the local admin user...
Created 10-25-2017 10:45 AM
Are you looking at the Ambari UI --> Hosts --> <specific Hostname> Page?
If you ahve Ambari Metrics Monitor Installed on the mentioned host then please make sure that the metrics monitor service is started and ambari metrics monitor is able to communicate with the Ambari Metrics Collector.
# ps -ef | grep resource_monitoring ams 26400 1 0 Oct12 ? 02:02:45 /usr/bin/python2.7 /usr/lib/python2.6/site-packages/resource_monitoring/main.py start
.
Also please check the following log for any error:
# less /var/log/ambari-metrics-monitor/ambari-metrics-monitor.out
.
Created on 10-25-2017 10:57 AM - edited 08-17-2019 06:11 PM
Can you please also share the following files:
1. /var/log/ambari-agent/ambari-agent.log (and ambari-agent.out)
2. The screenshot where you are not seeing the Memory/CPU details.
.
I am assuming that you are taking about the following Page Ambari UI -> Hosts --> Selected Host page.
Created on 10-25-2017 01:29 PM - edited 08-17-2019 06:11 PM
@Jay SenSharma No, I'm trying to ADD the hosts to the cluster, they are currently not present. Screenshot:
Ambari-metrics is installed but it hasn't run yet because the host is still not in any ambari cluster. ambari-agent.log (DEBUG enabled):
INFO 2017-10-25 15:25:42,947 DataCleaner.py:39 - Data cleanup thread started INFO 2017-10-25 15:25:42,949 DataCleaner.py:120 - Data cleanup started INFO 2017-10-25 15:25:42,949 DataCleaner.py:122 - Data cleanup finished INFO 2017-10-25 15:25:42,971 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2017-10-25 15:25:42,972 main.py:181 - Newloglevel=logging.DEBUG INFO 2017-10-25 15:25:42,972 main.py:436 - Connecting to Ambari server at https://ambari-data.billy.preprod:8440 (192.168.40.120) DEBUG 2017-10-25 15:25:42,973 NetUtil.py:110 - Trying to connect to https://ambari-data.billy.preprod:8440 INFO 2017-10-25 15:25:42,973 NetUtil.py:67 - Connecting to https://ambari-data.billy.preprod:8440/ca DEBUG 2017-10-25 15:25:43,064 NetUtil.py:87 - GET https://ambari-data.billy.preprod:8440/ca -> 200, body: INFO 2017-10-25 15:25:43,064 main.py:446 - Connected to Ambari server ambari-data.billy.preprod DEBUG 2017-10-25 15:25:43,064 Controller.py:66 - Initializing Controller RPC thread. INFO 2017-10-25 15:25:43,065 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads WARNING 2017-10-25 15:25:43,065 AlertSchedulerHandler.py:280 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs. INFO 2017-10-25 15:25:43,065 AlertSchedulerHandler.py:175 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x1779cd0>; currently running: False DEBUG 2017-10-25 15:25:43,066 scheduler.py:574 - Scheduler started DEBUG 2017-10-25 15:25:43,066 scheduler.py:579 - Looking for jobs to run DEBUG 2017-10-25 15:25:43,066 scheduler.py:599 - No jobs; waiting until a job is added INFO 2017-10-25 15:25:45,085 hostname.py:98 - Read public hostname 'druid-co01.billy.preprod' using socket.getfqdn() INFO 2017-10-25 15:25:45,127 Hardware.py:174 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/0 DEBUG 2017-10-25 15:25:45,296 HostCheckReportFileHandler.py:126 - Host check report at /var/lib/ambari-agent/data/hostcheck.result DEBUG 2017-10-25 15:25:45,297 HostCheckReportFileHandler.py:177 - Removing old host check file at /var/lib/ambari-agent/data/hostcheck.result DEBUG 2017-10-25 15:25:45,297 HostCheckReportFileHandler.py:182 - Creating host check file at /var/lib/ambari-agent/data/hostcheck.result INFO 2017-10-25 15:25:45,299 Controller.py:170 - Registering with druid-co01.billy.preprod (192.168.40.52) (agent='{"hardwareProfile": {"kernel": "Linux", "domain": "billy.preprod", "physicalprocessorcount": 4, "kernelrelease": "3.10.0-693.5.2.el7.x86_64", "uptime_days": "0", "memorytotal": 8002256, "swapfree": "0.00 GB", "memorysize": 8002256, "osfamily": "redhat", "swapsize": "0.00 GB", "processorcount": 4, "netmask": "255.255.255.0", "timezone": "CET", "hardwareisa": "x86_64", "memoryfree": 6255416, "operatingsystem": "centos", "kernelmajversion": "3.10", "kernelversion": "3.10.0", "macaddress": "00:1A:4A:16:01:AD", "operatingsystemrelease": "7.4.1708", "ipaddress": "192.168.40.52", "hostname": "druid-co01", "uptime_hours": "5", "fqdn": "druid-co01.billy.preprod", "id": "root", "architecture": "x86_64", "selinux": false, "mounts": [{"available": "5313004", "used": "3064340", "percent": "37%", "device": "/dev/vda1", "mountpoint": "/", "type": "xfs", "size": "8377344"}, {"available": "3978004", "used": "0", "percent": "0%", "device": "devtmpfs", "mountpoint": "/dev", "type": "devtmpfs", "size": "3978004"}], "hardwaremodel": "x86_64", "uptime_seconds": "18331", "interfaces": "eth0,lo"}, "currentPingPort": 8670, "prefix": "/var/lib/ambari-agent/data", "agentVersion": "2.5.0.3", "agentEnv": {"transparentHugePage": "", "hostHealth": {"agentTimeStampAtReporting": 1508937945297, "activeJavaProcs": [{"command": "/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server coordinator", "pid": 11784, "hadoop": true, "user": "root"}, {"command": "/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server overlord", "pid": 11886, "hadoop": true, "user": "root"}], "liveServices": [{"status": "Healthy", "name": "ntpd or chronyd", "desc": ""}]}, "reverseLookup": true, "alternatives": [], "umask": "18", "firewallName": "iptables", "stackFoldersAndFiles": [], "existingUsers": [], "firewallRunning": true}, "timestamp": 1508937945235, "hostname": "druid-co01.billy.preprod", "responseId": -1, "publicHostname": "druid-co01.billy.preprod"}') INFO 2017-10-25 15:25:45,300 NetUtil.py:67 - Connecting to https://ambari-data.billy.preprod:8440/connection_info DEBUG 2017-10-25 15:25:45,385 NetUtil.py:87 - GET https://ambari-data.billy.preprod:8440/connection_info -> 200, body: {"security.server.two_way_ssl":"false"} DEBUG 2017-10-25 15:25:45,385 security.py:52 - Server two-way SSL authentication required: False INFO 2017-10-25 15:25:45,385 security.py:93 - SSL Connect being called.. connecting to the server INFO 2017-10-25 15:25:45,516 security.py:60 - SSL connection established. Two-way SSL authentication is turned off on the server. DEBUG 2017-10-25 15:25:45,606 Controller.py:177 - Registration response is {u'agentConfig': {u'agent.auto.cache.update': u'true', u'agent.check.mounts.timeout': u'0', u'agent.check.remote.mounts': u'false'}, u'exitstatus': 0, u'response': u'OK', u'responseId': 0, u'statusCommands': []} INFO 2017-10-25 15:25:45,606 Controller.py:196 - Registration Successful (response id = 0) INFO 2017-10-25 15:25:45,606 AmbariConfig.py:316 - Updating config property (agent.check.remote.mounts) with value (false) INFO 2017-10-25 15:25:45,607 AmbariConfig.py:316 - Updating config property (agent.auto.cache.update) with value (true) INFO 2017-10-25 15:25:45,607 AmbariConfig.py:316 - Updating config property (agent.check.mounts.timeout) with value (0) DEBUG 2017-10-25 15:25:45,607 Controller.py:205 - Updated config:<AmbariConfig.AmbariConfig instance at 0x174bef0> DEBUG 2017-10-25 15:25:45,607 Controller.py:212 - Got status commands on registration. DEBUG 2017-10-25 15:25:45,607 Controller.py:256 - No status commands received from ambari-data.billy.preprod WARNING 2017-10-25 15:25:45,607 AlertSchedulerHandler.py:123 - There are no alert definition commands in the heartbeat; unable to update definitions INFO 2017-10-25 15:25:45,607 Controller.py:512 - Registration response from ambari-data.billy.preprod was OK INFO 2017-10-25 15:25:45,607 Controller.py:517 - Resetting ActionQueue... INFO 2017-10-25 15:25:55,619 Controller.py:304 - Heartbeat (response id = 0) with server is running... INFO 2017-10-25 15:25:55,619 Controller.py:311 - Building heartbeat message DEBUG 2017-10-25 15:25:55,620 Heartbeat.py:83 - Building Heartbeat: {responseId = 0, timestamp = 1508937955620, commandsInProgress = False, componentsMapped = False,recoveryTimestamp = -1} DEBUG 2017-10-25 15:25:55,620 Heartbeat.py:86 - Heartbeat: {'componentStatus': [], 'hostname': 'druid-co01.billy.preprod', 'nodeStatus': {'cause': 'NONE', 'status': 'HEALTHY'}, 'recoveryReport': {'summary': 'DISABLED'}, 'recoveryTimestamp': -1, 'reports': [], 'responseId': 0, 'timestamp': 1508937955620} INFO 2017-10-25 15:25:55,621 Heartbeat.py:90 - Adding host info/state to heartbeat message. DEBUG 2017-10-25 15:25:55,733 HostCheckReportFileHandler.py:126 - Host check report at /var/lib/ambari-agent/data/hostcheck.result DEBUG 2017-10-25 15:25:55,734 HostCheckReportFileHandler.py:177 - Removing old host check file at /var/lib/ambari-agent/data/hostcheck.result DEBUG 2017-10-25 15:25:55,734 HostCheckReportFileHandler.py:182 - Creating host check file at /var/lib/ambari-agent/data/hostcheck.result INFO 2017-10-25 15:25:55,770 Hardware.py:174 - Some mount points were ignored: /, /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/0 DEBUG 2017-10-25 15:25:55,771 Heartbeat.py:100 - agentEnv: {'transparentHugePage': '', 'hostHealth': {'agentTimeStampAtReporting': 1508937955734, 'activeJavaProcs': [{'command': u'/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server overlord', 'pid': 12022, 'hadoop': True, 'user': 'root'}, {'command': u'/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server coordinator', 'pid': 12023, 'hadoop': True, 'user': 'root'}], 'liveServices': [{'status': 'Healthy', 'name': 'ntpd or chronyd', 'desc': ''}]}, 'reverseLookup': True, 'alternatives': [], 'umask': '18', 'firewallName': 'iptables', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': True} DEBUG 2017-10-25 15:25:55,771 Heartbeat.py:101 - mounts: [] INFO 2017-10-25 15:25:55,771 Controller.py:318 - Sending Heartbeat (id = 0): {"alerts": [], "nodeStatus": {"status": "HEALTHY", "cause": "NONE"}, "timestamp": 1508937955620, "hostname": "druid-co01.billy.preprod", "responseId": 0, "reports": [], "mounts": [], "recoveryTimestamp": -1, "agentEnv": {"transparentHugePage": "", "hostHealth": {"agentTimeStampAtReporting": 1508937955734, "activeJavaProcs": [{"command": "/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server overlord", "pid": 12022, "hadoop": true, "user": "root"}, {"command": "/usr/bin/java -server -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath .:/etc/druid/:/etc/hadoop/conf/:/opt/druid/lib/* io.druid.cli.Main server coordinator", "pid": 12023, "hadoop": true, "user": "root"}], "liveServices": [{"status": "Healthy", "name": "ntpd or chronyd", "desc": ""}]}, "reverseLookup": true, "alternatives": [], "umask": "18", "firewallName": "iptables", "stackFoldersAndFiles": [], "existingUsers": [], "firewallRunning": true}, "recoveryReport": {"summary": "DISABLED"}, "componentStatus": []} INFO 2017-10-25 15:25:55,775 Controller.py:332 - Heartbeat response received (id = 1) INFO 2017-10-25 15:25:55,775 Controller.py:341 - Heartbeat interval is 10 seconds INFO 2017-10-25 15:25:55,775 Controller.py:377 - Updating configurations from heartbeat INFO 2017-10-25 15:25:55,775 Controller.py:386 - Adding cancel/execution commands DEBUG 2017-10-25 15:25:55,775 Controller.py:246 - No commands received from ambari-data.billy.preprod DEBUG 2017-10-25 15:25:55,775 Controller.py:256 - No status commands received from ambari-data.billy.preprod INFO 2017-10-25 15:25:55,775 Controller.py:403 - Adding recovery commands DEBUG 2017-10-25 15:25:55,775 Controller.py:422 - No commands sent from ambari-data.billy.preprod INFO 2017-10-25 15:25:55,775 Controller.py:471 - Waiting 9.9 for next heartbeat INFO 2017-10-25 15:26:05,677 Controller.py:478 - Wait for next heartbeat over DEBUG 2017-10-25 15:26:05,677 Controller.py:304 - Heartbeat (response id = 1) with server is running... DEBUG 2017-10-25 15:26:05,677 Controller.py:311 - Building heartbeat message DEBUG 2017-10-25 15:26:05,678 Heartbeat.py:83 - Building Heartbeat: {responseId = 1, timestamp = 1508937965678, commandsInProgress = False, componentsMapped = False,recoveryTimestamp = -1} DEBUG 2017-10-25 15:26:05,679 Heartbeat.py:86 - Heartbeat: {'componentStatus': [], 'hostname': 'druid-co01.billy.preprod', 'nodeStatus': {'cause': 'NONE', 'status': 'HEALTHY'}, 'recoveryReport': {'summary': 'DISABLED'}, 'recoveryTimestamp': -1, 'reports': [], 'responseId': 1, 'timestamp': 1508937965678} DEBUG 2017-10-25 15:26:05,680 Controller.py:318 - Sending Heartbeat (id = 1): {"alerts": [], "nodeStatus": {"status": "HEALTHY", "cause": "NONE"}, "timestamp": 1508937965678, "hostname": "druid-co01.billy.preprod", "responseId": 1, "reports": [], "recoveryTimestamp": -1, "recoveryReport": {"summary": "DISABLED"}, "componentStatus": []} DEBUG 2017-10-25 15:26:05,683 Controller.py:332 - Heartbeat response received (id = 2) DEBUG 2017-10-25 15:26:05,683 Controller.py:341 - Heartbeat interval is 10 seconds DEBUG 2017-10-25 15:26:05,683 Controller.py:377 - Updating configurations from heartbeat DEBUG 2017-10-25 15:26:05,684 Controller.py:386 - Adding cancel/execution commands DEBUG 2017-10-25 15:26:05,684 Controller.py:246 - No commands received from ambari-data.billy.preprod DEBUG 2017-10-25 15:26:05,684 Controller.py:256 - No status commands received from ambari-data.billy.preprod DEBUG 2017-10-25 15:26:05,684 Controller.py:403 - Adding recovery commands DEBUG 2017-10-25 15:26:05,684 Controller.py:422 - No commands sent from ambari-data.billy.preprod DEBUG 2017-10-25 15:26:05,684 Controller.py:471 - Waiting 9.9 for next heartbeat
And the last part of the server log:
25 Oct 2017 15:25:43,107 WARN [qtp-ambari-agent-39] SecurityFilter:103 - Request https://ambari-data.billy.preprod:8440/ca doesn't match any pattern. 25 Oct 2017 15:25:43,107 WARN [qtp-ambari-agent-39] SecurityFilter:62 - This request is not allowed on this port: https://ambari-data.billy.preprod:8440/ca 25 Oct 2017 15:25:45,570 INFO [qtp-ambari-agent-36] HeartBeatHandler:425 - agentOsType = centos7 25 Oct 2017 15:25:45,647 INFO [qtp-ambari-agent-36] HostImpl:329 - Received host registration, host=[hostname=druid-co01,fqdn=druid-co01.billy.preprod,domain=billy.preprod,architecture=x86_64,processorcount=4,physicalprocessorcount=4,osname=centos,osversion=7.4.1708,osfamily=redhat,memory=8002256,uptime_hours=5,mounts=(available=5313004,mountpoint=/,used=3064340,percent=37%,size=8377344,device=/dev/vda1,type=xfs)(available=3978004,mountpoint=/dev,used=0,percent=0%,size=3978004,device=devtmpfs,type=devtmpfs)] , registrationTime=1508937945570, agentVersion=2.5.0.3 25 Oct 2017 15:25:45,647 INFO [qtp-ambari-agent-36] TopologyManager:548 - TopologyManager.onHostRegistered: Entering 25 Oct 2017 15:25:45,647 INFO [qtp-ambari-agent-36] TopologyManager:602 - Host druid-co01.billy.preprod re-registered, will not be added to the available hosts list
Created 10-25-2017 01:35 PM
The only "strange" thing I see is in the ambari-audit.log, about my user (vide):
2017-10-25T12:23:50.729+0200, User(vide), RemoteIp(192.168.150.13), Operation(Request from server), RequestType(POST), url(http://ambari-data.billy.preprod/api/v1/requests), ResultStatus(403 Forbidden), Reason(The authenticated user is not authorized to execute the action check_host.), Command(null), Cluster name(null)
I should be in the cluster admin group..I'll try again with the local admin user...
Created 10-25-2017 01:38 PM
Ok, solved, it was indeed a permission problem. But this should be considered a bug nonetheless. The message doesn't tell anything about the issue PLUS it starts an automated loop of error box -> logging out -> logging in -> error box with no options to jump out of it but waiting for the session to expire with the browser tab closed
Created 05-16-2018 02:25 PM
This might help to other people. This seems like a bug :
https://issues.apache.org/jira/browse/AMBARI-21687
The bug is fixed in 2.6.0.0 branch only ( not fixed in 2.4.x.x or 2.5.x.x versions of ambari). So, The available workaround is to add the node via 'Admin' user or use the ambari 2.6.x