Created on 04-08-2016 08:48 AM - edited 09-16-2022 03:12 AM
I am very new to all this stuff, but as a learning exercise I am trying to set up 4-node ambari cluster on EC2. I have master node that has both agent and server installed and three nodes with agents, all running on centOS 6.
When I am trying to make a cluster using UI and manual host registration, all nodes except master fail with the following error:
Registering with the server... Registration with the server failed.
Though I got agents running already and hosts set up, I tried using automatic registration with SSH and got more verbose error:
==========================
Creating target directory...
==========================
Command start time 2016-04-08 06:34:14
Connection to ec2node1.hdp2 closed.
SSH command execution finished
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:14
==========================
Copying common functions script...
==========================
Command start time 2016-04-08 06:34:14
scp /usr/lib/python2.6/site-packages/ambari_commons
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:14
==========================
Copying OS type check script...
==========================
Command start time 2016-04-08 06:34:14
scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:14
==========================
Running OS type check...
==========================
Command start time 2016-04-08 06:34:14
Cluster primary/cluster OS family is redhat6 and local/current OS family is redhat6
Connection to ec2node1.hdp2 closed.
SSH command execution finished
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:14
==========================
Checking 'sudo' package on remote host...
==========================
Command start time 2016-04-08 06:34:14
sudo-1.8.6p3-20.el6_7.x86_64
Connection to ec2node1.hdp2 closed.
SSH command execution finished
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:14
==========================
Copying repo file to 'tmp' folder...
==========================
Command start time 2016-04-08 06:34:14
scp /etc/yum.repos.d/ambari.repo
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:14
==========================
Moving file to repo dir...
==========================
Command start time 2016-04-08 06:34:14
Connection to ec2node1.hdp2 closed.
SSH command execution finished
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:14
==========================
Changing permissions for ambari.repo...
==========================
Command start time 2016-04-08 06:34:14
Connection to ec2node1.hdp2 closed.
SSH command execution finished
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:15
==========================
Copying setup script file...
==========================
Command start time 2016-04-08 06:34:15
scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:15
==========================
Running setup agent script...
==========================
Command start time 2016-04-08 06:34:15
('WARNING 2016-04-08 06:34:21,213 AlertSchedulerHandler.py:243 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2016-04-08 06:34:21,213 AlertSchedulerHandler.py:139 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x1230a50>; currently running: False
INFO 2016-04-08 06:34:21,217 hostname.py:86 - Read public hostname \'ec2-52-63-181-16.ap-southeast-2.compute.amazonaws.com\' from http://169.254.169.254/latest/meta-data/public-hostname
INFO 2016-04-08 06:34:21,220 logger.py:67 - call[\'test -w /\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,224 logger.py:67 - call returned (0, \'\')
INFO 2016-04-08 06:34:21,224 logger.py:67 - call[\'test -w /dev/shm\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,228 logger.py:67 - call returned (0, \'\')
INFO 2016-04-08 06:34:21,228 logger.py:67 - call[\'test -w /grid/1\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,232 logger.py:67 - call returned (0, \'\')
INFO 2016-04-08 06:34:21,232 logger.py:67 - call[\'test -w /grid/2\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,236 logger.py:67 - call returned (0, \'\')
INFO 2016-04-08 06:34:21,236 logger.py:67 - call[\'test -w /grid/3\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,239 logger.py:67 - call returned (0, \'\')
ERROR 2016-04-08 06:34:21,251 main.py:309 - Fatal exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 306, in <module>
    main(heartbeat_stop_callback)
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 297, in main
    ExitHelper.execute_cleanup()
TypeError: unbound method execute_cleanup() must be called with ExitHelper instance as first argument (got nothing instead)
', None)
('WARNING 2016-04-08 06:34:21,213 AlertSchedulerHandler.py:243 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2016-04-08 06:34:21,213 AlertSchedulerHandler.py:139 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x1230a50>; currently running: False
INFO 2016-04-08 06:34:21,217 hostname.py:86 - Read public hostname \'ec2-52-63-181-16.ap-southeast-2.compute.amazonaws.com\' from http://169.254.169.254/latest/meta-data/public-hostname
INFO 2016-04-08 06:34:21,220 logger.py:67 - call[\'test -w /\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,224 logger.py:67 - call returned (0, \'\')
INFO 2016-04-08 06:34:21,224 logger.py:67 - call[\'test -w /dev/shm\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,228 logger.py:67 - call returned (0, \'\')
INFO 2016-04-08 06:34:21,228 logger.py:67 - call[\'test -w /grid/1\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,232 logger.py:67 - call returned (0, \'\')
INFO 2016-04-08 06:34:21,232 logger.py:67 - call[\'test -w /grid/2\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,236 logger.py:67 - call returned (0, \'\')
INFO 2016-04-08 06:34:21,236 logger.py:67 - call[\'test -w /grid/3\'] {\'sudo\': True, \'timeout\': 5}
INFO 2016-04-08 06:34:21,239 logger.py:67 - call returned (0, \'\')
ERROR 2016-04-08 06:34:21,251 main.py:309 - Fatal exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 306, in <module>
    main(heartbeat_stop_callback)
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 297, in main
    ExitHelper.execute_cleanup()
TypeError: unbound method execute_cleanup() must be called with ExitHelper instance as first argument (got nothing instead)
', None)
Connection to ec2node1.hdp2 closed.
SSH command execution finished
host=ec2node1.hdp2, exitcode=0
Command end time 2016-04-08 06:34:23
Registering with the server...
Registration with the server failed.Any help or pointers would be greatly appreciated!
Created 04-08-2016 01:13 PM
Hi @Jan Andreev,
I'm not exactly sure why you're seeing the issue with automatic registration, but if you're just trying to get things up and running manually, you can probably just modify the ambari-agent config files.
You manual setup step may have failed if the agents are not configured to point to the ambari-server instance in your cluster. You mentioned that the registration doesn't fail on the ambari-server node, and that makes sense, since the default hostname pointer is "localhost".
For manual configuration, you can just set this up in the ambari-agent config file, on Centos 6 it will be in:
/etc/ambari-agent/conf/ambari-agent.ini
Just set the "hostname" property to be the DNS name for the node that has ambari-server running, and restart your ambari-agent instances.
Hope this helps!
Bob
Created 04-08-2016 01:13 PM
Hi @Jan Andreev,
I'm not exactly sure why you're seeing the issue with automatic registration, but if you're just trying to get things up and running manually, you can probably just modify the ambari-agent config files.
You manual setup step may have failed if the agents are not configured to point to the ambari-server instance in your cluster. You mentioned that the registration doesn't fail on the ambari-server node, and that makes sense, since the default hostname pointer is "localhost".
For manual configuration, you can just set this up in the ambari-agent config file, on Centos 6 it will be in:
/etc/ambari-agent/conf/ambari-agent.ini
Just set the "hostname" property to be the DNS name for the node that has ambari-server running, and restart your ambari-agent instances.
Hope this helps!
Bob
Created 04-08-2016 01:27 PM
Thanks @rnettleton, I got to similar solution while waiting for question to be moderated, but your answer definitely makes it clearer why it wasn't working!
Created 04-08-2016 07:16 PM
You are running into AMBARI-14431. This should happen on the latest. Can you use the latest or update your script manually?
Created 06-21-2016 03:23 PM
Check your .ini file on all the slaves/agents
/etc/ambari-agent/conf/ambari-agent.ini
And then make sure your ports below are open on your MASTER
url_port=8440
secured_url_port=8441
 
					
				
				
			
		
