Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem with ambari-agents not registering to ambari-server

Problem with ambari-agents not registering to ambari-server

New Contributor

I am trying to install Ambari v2.7.1.0 with install wizard in web UI, on step two I choose Provide your SHH Private Key to automatically register hosts. I have configured SHH password-less communication between server machine and hosts, users on hosts and server has Sudo without password, every machine has installed and running NTP service, firewall service on all machines is not running and SELinux is disabled. Ambari server is setted up with non-root user, default JDK, default DB, no LZO. When it tries to register hosts on confirm host step on web install wizard it fails on all machines on last step with:

==========================

Running setup agent script...

==========================




Command start time 2019-07-03 12:10:34

Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network connectivity between the Ambari Agent host and the Ambari Server


	



Connection to example-edge-ent.internal.example.io closed.

SSH command execution finished

host=example-edge-ent.internal.example.io, exitcode=1

Command end time 2019-07-03 12:10:34



ERROR: Bootstrap of host example-edge-ent.internal.example.io fails because previous action finished with non-zero exit code (1)

ERROR MESSAGE: Connection to example-edge-ent.internal.example.io closed.




STDOUT: Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network connectivity between the Ambari Agent host and the Ambari Server




Connection to example-edge-ent.internal.example.io closed.

When it failed I tried manual registration. Installed ambari-agent with same repository as ambari-server and put server FQDN in /etc/ambari-agent/conf/ambari-agent.ini host machine. Then started ambari-agent service and the log of the agent was:

INFO 2019-07-03 13:31:12,835 main.py:155 - loglevel=logging.INFO

INFO 2019-07-03 13:31:12,840 Hardware.py:68 - Initializing host system information.

INFO 2019-07-03 13:31:12,849 Hardware.py:188 - Some mount points were ignored: /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/1000

WARNING 2019-07-03 13:31:12,875 Facter.py:499 - Can't get the IP address for eth1

INFO 2019-07-03 13:31:12,876 Facter.py:202 - Directory: '/etc/resource_overrides' does not exist - it won't be used for gathering system resources.

INFO 2019-07-03 13:31:12,883 Hardware.py:73 - Host system information: {'kernel': 'Linux', 'domain': 'localdomain', 'physicalprocessorcount': 2, 'kernelrelease': '3.10.0-957.12.2.el7.x86_64', 'uptime_days': '1', 'memorytotal': 14869928, 'swapfree': '0.00 GB', 'memorysize': 14869928, 'osfamily': 'redhat', 'swapsize': '0.00 GB', 'processorcount': 2, 'netmask': '255.0.0.0', 'timezone': 'CET', 'hardwareisa': 'x86_64', 'memoryfree': 13739552, 'operatingsystem': 'centos', 'kernelmajversion': '3.10', 'kernelversion': '3.10.0', 'macaddress': 'FA:16:3E:CD:68:63', 'operatingsystemrelease': '7.6.1810', 'ipaddress': '127.0.0.1', 'hostname': 'localhost', 'uptime_hours': '45', 'fqdn': 'localhost.localdomain', 'id': 'root', 'architecture': 'x86_64', 'selinux': False, 'mounts': [{'available': '47427656', 'used': '1905732', 'percent': '4%', 'device': '/dev/sda1', 'mountpoint': '/', 'type': 'ext4', 'size': '51493068'}], 'hardwaremodel': 'x86_64', 'uptime_seconds': '162778', 'interfaces': 'eth0,eth1,eth2,lo'}

INFO 2019-07-03 13:31:12,886 DataCleaner.py:39 - Data cleanup thread started

INFO 2019-07-03 13:31:12,888 DataCleaner.py:120 - Data cleanup started

INFO 2019-07-03 13:31:12,890 DataCleaner.py:122 - Data cleanup finished

INFO 2019-07-03 13:31:12,895 PingPortListener.py:50 - Ping port listener started on port: 8670

INFO 2019-07-03 13:31:12,899 main.py:481 - Connecting to Ambari server at https://******-edge.******.******.io:8440 (10.2.3.122)

INFO 2019-07-03 13:31:12,899 NetUtil.py:61 - Connecting to https://******-edge.******.******.io:8440/ca

INFO 2019-07-03 13:31:13,067 main.py:491 - Connected to Ambari server ******-edge.******.******.io

INFO 2019-07-03 13:31:13,067 AlertSchedulerHandler.py:149 - [AlertScheduler] Starting ; currently running: False

INFO 2019-07-03 13:31:13,070 NetUtil.py:61 - Connecting to https://******-edge.******.******.io:8440/connection_info

INFO 2019-07-03 13:31:13,156 security.py:61 - Connecting to wss://******-edge.******.******.io:8441/agent/stomp/v1

INFO 2019-07-03 13:31:13,305 transport.py:329 - Starting receiver loop

INFO 2019-07-03 13:31:13,338 security.py:67 - SSL connection established. Two-way SSL authentication is turned off on the server.

INFO 2019-07-03 13:31:13,901 hostname.py:103 - Read public hostname '******-worker-03' from http://169.254.169.254/latest/meta-data/public-hostname

INFO 2019-07-03 13:31:13,902 HeartbeatThread.py:126 - Sending registration request

INFO 2019-07-03 13:31:13,904 security.py:135 - Event to server at /register (correlation_id=0): {'currentPingPort': 8670, 'timestamp': 1562153473340, 'hostname': 'localhost.localdomain', 'publicHostname': '******-worker-03', 'hardwareProfile': {'kernel': 'Linux', 'domain': 'localdomain', 'kernelrelease': '3.10.0-957.12.2.el7.x86_64', 'uptime_days': '1', 'memorytotal': 14869928, 'swapfree': '0.00 GB', 'processorcount': 2, 'selinux': False, 'timezone': 'CET', 'hardwareisa': 'x86_64', 'operatingsystem': 'centos', 'hostname': 'localhost', 'id': 'root', 'memoryfree': 13739552, 'hardwaremodel': 'x86_64', 'uptime_seconds': '162778', 'osfamily': 'redhat', 'physicalprocessorcount': 2, 'interfaces': 'eth0,eth1,eth2,lo', 'memorysize': 14869928, 'swapsize': '0.00 GB', 'netmask': '255.0.0.0', 'ipaddress': '127.0.0.1', 'kernelmajversion': '3.10', 'kernelversion': '3.10.0', 'macaddress': 'FA:16:3E:CD:68:63', 'operatingsystemrelease': '7.6.1810', 'uptime_hours': '45', 'fqdn': 'localhost.localdomain', 'architecture': 'x86_64', 'mounts': [{'available': '47427656', 'used': '1905732', 'percent': '4%', 'device': '/dev/sda1', 'mountpoint': '/', 'type': 'ext4', 'size': '51493068'}]}, 'agentEnv': {'transparentHugePage': '', 'hostHealth': {'agentTimeStampAtReporting': 1562153473440, 'liveServices': [{'status': 'Healthy', 'name': 'chronyd', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': None, 'alternatives': [], 'firewallName': 'iptables', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'prefix': '/var/lib/ambari-agent/data', 'agentVersion': '2.7.1.0', 'agentStartTime': 1562153472883, 'id': -1}

INFO 2019-07-03 13:31:14,163 __init__.py:57 - Event from server at /user/ (correlation_id=0): {u'status': u'OK', u'exitstatus': 0, u'id': 0}

INFO 2019-07-03 13:31:14,172 HeartbeatThread.py:131 - Registration response received

INFO 2019-07-03 13:31:14,172 security.py:135 - Event to server at /agents/topologies (correlation_id=1): {'hash': None}

INFO 2019-07-03 13:31:14,185 __init__.py:57 - Event from server at /user/ (correlation_id=1): {u'eventType': u'CREATE', u'hash': u'f1a6aa00caef58abdf7937c7743616feeb11834c150fdf8f6c9c739d7d7b0dda2578217c446a67a36741fd391858109859135319d681e66047423eb072ddbfa3', u'clusters': {}}

INFO 2019-07-03 13:31:14,190 security.py:135 - Event to server at /agents/metadata (correlation_id=2): {'hash': None}

INFO 2019-07-03 13:31:14,208 __init__.py:57 - Event from server at /user/ (correlation_id=2): {u'eventType': u'CREATE', u'hash': u'aa94b4e19608907d04aaa7c0c2f6d6bf3fe7e524ade18b1df3efe5d1c77934a06749f0229699729c3e2520f3b858f17452a7fa39ef1c1d6d9e6c54e64ea81657', u'clusters': {u'-1': {u'clusterLevelParams': {u'jdk_location': u'http://localhost:8080/resources', u'agent_stack_retry_count': u'5', u'db_driver_filename': u'mysql-connector-java.jar', u'agent_stack_retry_on_unavailability': u'false', u'ambari_db_rca_url': u'jdbc:postgresql://localhost/ambarirca', u'jce_name': u'jce_policy-8.zip', u'java_version': u'8', u'ambari_server_host': u'localhost', u'ambari_db_rca_password': u'mapred', u'ambari_server_port': u'8080', u'host_sys_prepped': u'false', u'db_name': u'ambari', u'oracle_jdbc_url': u'http://localhost:8080/resources/ojdbc6.jar', u'ambari_db_rca_username': u'mapred', u'ambari_db_rca_driver': u'org.postgresql.Driver', u'ambari_server_use_ssl': u'false', u'jdk_name': u'jdk-8u112-linux-x64.tar.gz', u'gpl_license_accepted': u'false', u'java_home': u'/usr/jdk64/jdk1.8.0_112', u'mysql_jdbc_url': u'http://localhost:8080/resources/mysql-connector-java.jar'}, u'agentConfigs': {u'agentConfig': {u'agent.auto.cache.update': u'true', u'agent.check.remote.mounts': u'false', u'agent.check.mounts.timeout': u'0', u'java.home': u'/usr/jdk64/jdk1.8.0_112'}}, u'fullServiceLevelMetadata': False}}}

INFO 2019-07-03 13:31:14,223 ClusterCache.py:125 - Rewriting cache ClusterMetadataCache for cluster -1

INFO 2019-07-03 13:31:14,225 AmbariConfig.py:370 - Updating config property (agent.auto.cache.update) with value (true)

INFO 2019-07-03 13:31:14,226 AmbariConfig.py:370 - Updating config property (agent.check.remote.mounts) with value (false)

INFO 2019-07-03 13:31:14,226 AmbariConfig.py:370 - Updating config property (agent.check.mounts.timeout) with value (0)

INFO 2019-07-03 13:31:14,226 AmbariConfig.py:370 - Updating config property (java.home) with value (/usr/jdk64/jdk1.8.0_112)

INFO 2019-07-03 13:31:14,227 security.py:135 - Event to server at /agents/configs (correlation_id=3): {'hash': None}

INFO 2019-07-03 13:31:14,238 __init__.py:57 - Event from server at /user/ (correlation_id=3): {u'timestamp': 1562153474111, u'hash': u'eb07592a169449fbe68562d6d84a0516c45476c4bff3c6709fe30104f8cdf7a162cf8059e310180490a8cd109762288d746a37e59311930d503131d08fb0bfb7', u'clusters': {}}

INFO 2019-07-03 13:31:14,244 security.py:135 - Event to server at /agents/host_level_params (correlation_id=4): {'hash': None}

INFO 2019-07-03 13:31:14,253 __init__.py:57 - Event from server at /user/ (correlation_id=4): {u'clusters': {}, u'hash': u'eb07592a169449fbe68562d6d84a0516c45476c4bff3c6709fe30104f8cdf7a162cf8059e310180490a8cd109762288d746a37e59311930d503131d08fb0bfb7'}

INFO 2019-07-03 13:31:14,262 security.py:135 - Event to server at /agents/alert_definitions (correlation_id=5): {'hash': None}

INFO 2019-07-03 13:31:14,273 __init__.py:57 - Event from server at /user/ (correlation_id=5): {u'clusters': {}, u'hostName': u'localhost.localdomain', u'hash': u'093d7e70689460b1238c5a9eabb3668ac884ceb1e9ef4e1d262abf3b5b870b48f207d6f63223a1c423ddbcdcbc48cfa4335a4bce05fd893680eba481bdedaee5', u'eventType': u'CREATE'}

INFO 2019-07-03 13:31:14,279 AlertSchedulerHandler.py:212 - [AlertScheduler] Rescheduling all jobs...

INFO 2019-07-03 13:31:14,279 AlertSchedulerHandler.py:233 - [AlertScheduler] Reschedule Summary: 0 unscheduled, 0 rescheduled

INFO 2019-07-03 13:31:14,281 security.py:135 - Event to server at /heartbeat (correlation_id=6): {'id': 0}

INFO 2019-07-03 13:31:14,322 __init__.py:57 - Event from server at /user/ (correlation_id=6): {u'status': u'OK', u'id': 1}

INFO 2019-07-03 13:31:24,346 security.py:135 - Event to server at /heartbeat (correlation_id=7): {'id': 1}

INFO 2019-07-03 13:31:24,355 __init__.py:57 - Event from server at /user/ (correlation_id=7): {u'status': u'OK', u'id': 2}

INFO 2019-07-03 13:31:34,364 security.py:135 - Event to server at /heartbeat (correlation_id=8): {'id': 2}

INFO 2019-07-03 13:31:34,372 __init__.py:57 - Event from server at /user/ (correlation_id=8): {u'status': u'OK', u'id': 3}

INFO 2019-07-03 13:31:44,373 security.py:135 - Event to server at /heartbeat (correlation_id=9): {'id': 3}

INFO 2019-07-03 13:31:44,392 __init__.py:57 - Event from server at /user/ (correlation_id=9): {u'status': u'OK', u'id': 4}

INFO 2019-07-03 13:31:54,407 security.py:135 - Event to server at /heartbeat (correlation_id=10): {'id': 4}

INFO 2019-07-03 13:31:54,416 __init__.py:57 - Event from server at /user/ (correlation_id=10): {u'status': u'OK', u'id': 5}

INFO 2019-07-03 13:32:04,425 security.py:135 - Event to server at /heartbeat (correlation_id=11): {'id': 5}

INFO 2019-07-03 13:32:04,432 __init__.py:57 - Event from server at /user/ (correlation_id=11): {u'status': u'OK', u'id': 6}

INFO 2019-07-03 13:32:13,169 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.

INFO 2019-07-03 13:32:13,194 Hardware.py:188 - Some mount points were ignored: /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/1000

INFO 2019-07-03 13:32:13,194 security.py:135 - Event to server at /reports/host_status (correlation_id=12): {'agentEnv': {'transparentHugePage': '', 'hostHealth': {'agentTimeStampAtReporting': 1562153533179, 'liveServices': [{'status': 'Healthy', 'name': 'chronyd', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': False, 'alternatives': [], 'firewallName': 'iptables', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '47427640', 'used': '1905748', 'percent': '4%', 'device': '/dev/sda1', 'mountpoint': '/', 'type': 'ext4', 'size': '51493068'}]}

INFO 2019-07-03 13:32:13,208 __init__.py:57 - Event from server at /user/ (correlation_id=12): {u'status': u'OK'}

INFO 2019-07-03 13:32:14,434 security.py:135 - Event to server at /heartbeat (correlation_id=13): {'id': 6}

INFO 2019-07-03 13:32:14,441 __init__.py:57 - Event from server at /user/ (correlation_id=13): {u'status': u'OK', u'id': 7}

INFO 2019-07-03 13:32:24,443 security.py:135 - Event to server at /heartbeat (correlation_id=14): {'id': 7}

INFO 2019-07-03 13:32:24,451 __init__.py:57 - Event from server at /user/ (correlation_id=14): {u'status': u'OK', u'id': 8}

INFO 2019-07-03 13:32:34,460 security.py:135 - Event to server at /heartbeat (correlation_id=15): {'id': 8}

INFO 2019-07-03 13:32:34,469 __init__.py:57 - Event from server at /user/ (correlation_id=15): {u'status': u'OK', u'id': 9}

INFO 2019-07-03 13:32:44,479 security.py:135 - Event to server at /heartbeat (correlation_id=16): {'id': 9}

INFO 2019-07-03 13:32:44,487 __init__.py:57 - Event from server at /user/ (correlation_id=16): {u'status': u'OK', u'id': 10}

INFO 2019-07-03 13:32:54,488 security.py:135 - Event to server at /heartbeat (correlation_id=17): {'id': 10}

INFO 2019-07-03 13:32:54,494 __init__.py:57 - Event from server at /user/ (correlation_id=17): {u'status': u'OK', u'id': 11}

INFO 2019-07-03 13:33:04,496 security.py:135 - Event to server at /heartbeat (correlation_id=18): {'id': 11}

INFO 2019-07-03 13:33:04,503 __init__.py:57 - Event from server at /user/ (correlation_id=18): {u'status': u'OK', u'id': 12}

INFO 2019-07-03 13:33:13,278 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.

INFO 2019-07-03 13:33:13,298 Hardware.py:188 - Some mount points were ignored: /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/1000

INFO 2019-07-03 13:33:13,298 security.py:135 - Event to server at /reports/host_status (correlation_id=19): {'agentEnv': {'transparentHugePage': '', 'hostHealth': {'agentTimeStampAtReporting': 1562153593288, 'liveServices': [{'status': 'Healthy', 'name': 'chronyd', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': False, 'alternatives': [], 'firewallName': 'iptables', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '47427632', 'used': '1905756', 'percent': '4%', 'device': '/dev/sda1', 'mountpoint': '/', 'type': 'ext4', 'size': '51493068'}]}

INFO 2019-07-03 13:33:13,306 __init__.py:57 - Event from server at /user/ (correlation_id=19): {u'status': u'OK'}

INFO 2019-07-03 13:33:14,505 security.py:135 - Event to server at /heartbeat (correlation_id=20): {'id': 12}

INFO 2019-07-03 13:33:14,513 __init__.py:57 - Event from server at /user/ (correlation_id=20): {u'status': u'OK', u'id': 13}

INFO 2019-07-03 13:33:24,514 security.py:135 - Event to server at /heartbeat (correlation_id=21): {'id': 13}

INFO 2019-07-03 13:33:24,521 __init__.py:57 - Event from server at /user/ (correlation_id=21): {u'status': u'OK', u'id': 14}

INFO 2019-07-03 13:33:34,523 security.py:135 - Event to server at /heartbeat (correlation_id=22): {'id': 14}

INFO 2019-07-03 13:33:34,530 __init__.py:57 - Event from server at /user/ (correlation_id=22): {u'status': u'OK', u'id': 15}

INFO 2019-07-03 13:33:44,532 security.py:135 - Event to server at /heartbeat (correlation_id=23): {'id': 15}

INFO 2019-07-03 13:33:44,542 __init__.py:57 - Event from server at /user/ (correlation_id=23): {u'status': u'OK', u'id': 16}

INFO 2019-07-03 13:33:54,543 security.py:135 - Event to server at /heartbeat (correlation_id=24): {'id': 16}

INFO 2019-07-03 13:33:54,551 __init__.py:57 - Event from server at /user/ (correlation_id=24): {u'status': u'OK', u'id': 17}

INFO 2019-07-03 13:34:04,552 security.py:135 - Event to server at /heartbeat (correlation_id=25): {'id': 17}

INFO 2019-07-03 13:34:04,559 __init__.py:57 - Event from server at /user/ (correlation_id=25): {u'status': u'OK', u'id': 18}


In web install wizard on confirm hosts all hosts failed. Status is

Registering with the server...
Registration with the server failed.
2 REPLIES 2

Re: Problem with ambari-agents not registering to ambari-server

Super Mentor

@Blagoi Pavlov

There are some indications that you are not using FQDN for all cluster nodes.


Must use FQDN

Please avoid using "localhost" while setting up ambari cluster. You must use FQDN (Fully Qualified Hostname) for all the hosts. Also every node in your cluster should be able to resolve each other using their FQDN (may be you can keep the same "/etc/hosts" entries in all hosts so that they can resolve each other)


I see that your agent registration is using "localhost" (localhost.localdomain) which is incorrect they should be using the correct IP Address and hostname mapping. You should not be using Lookback address 127.0.0.1)

INFO 2019-07-03 13:31:13,904 security.py:135 - Event to server at /register (correlation_id=0): {'currentPingPort': 8670, 'timestamp': 1562153473340, 'hostname': 'localhost.localdomain',

.
.
 'ipaddress': '127.0.0.1', 'hostname': 'localhost', 

https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.3.0/bk_ambari-installation/content/collect_info...

.

.

Also as you mentioned in the Issue description that you are using Ambari Server 2.7.3 but i see that your Ambari Agent registration version shows " 'agentVersion': '2.7.1.0'.


Ambari Server and Agent version should be same. So please check if you are using correct binaries or not on all cluster nodes?

# rpm -qa | grep ambari  

.

Re: Problem with ambari-agents not registering to ambari-server

New Contributor

I'm sorry for the ambari version 2.7.3. It was a typo and i fixed it. On every hosts the repo is 2.7.1, so it's not compatibility issue every host can resolve each other on there FQDN.

Don't have an account?
Coming from Hortonworks? Activate your account here