Support Questions
Find answers, ask questions, and share your expertise

Ambari UI stuck at adding new host

Ambari UI stuck at adding new host

Contributor

Unfortunately I terminated a slave instance in my hcp cluster which was hosting hive server 2, hive metastore and mysql db. In my ambari UI I am getting heart beat lost issue in the services which were in that instance. To fix this I tried adding new host to bring back my services using Host - > Add new Host. I followed below steps for this.

1 - Create new EC2 instance - Cent os 7 - Same as my other instances.

2- Installed yum update & epel repo adding

3- Setup password less authentication from Ambari server to the new Host

4- Filled step 1 parameters - Private key and host ip for the new instance

After step 4 ambari UI is stuck ( Screen catpure - suck.png) and not going to futher step. I checked both ambari-agent & ambari server log but couldn't find any issues. What could be the reason for this ? How can I resolve or futher investigate ?

Ambari agent log :

INFO 2018-06-08 03:30:29,933 Controller.py:304 - Heartbeat (response id = 30) with server is running...
INFO 2018-06-08 03:30:29,933 Controller.py:311 - Building heartbeat message
INFO 2018-06-08 03:30:29,934 Heartbeat.py:90 - Adding host info/state to heartbeat message.
INFO 2018-06-08 03:30:29,989 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2018-06-08 03:30:30,001 Hardware.py:176 - Some mount points were ignored: /, /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/1000, /run/user/0
INFO 2018-06-08 03:30:30,001 Controller.py:320 - Sending Heartbeat (id = 30)
INFO 2018-06-08 03:30:30,003 Controller.py:333 - Heartbeat response received (id = 31)
INFO 2018-06-08 03:30:30,003 Controller.py:342 - Heartbeat interval is 10 seconds
INFO 2018-06-08 03:30:30,003 Controller.py:380 - Updating configurations from heartbeat
INFO 2018-06-08 03:30:30,003 Controller.py:389 - Adding cancel/execution commands
INFO 2018-06-08 03:30:30,003 Controller.py:406 - Adding recovery commands
INFO 2018-06-08 03:30:30,003 Controller.py:475 - Waiting 9.9 for next heartbeat
INFO 2018-06-08 03:30:39,904 Controller.py:482 - Wait for next heartbeat over

Ambari server log :

and will be failed
08 Jun 2018 03:26:00,590  INFO [ambari-action-scheduler] ActionScheduler:809 - Removing command from queue, host=ip-172-31-18-247.ec2.internal, commandId=1326-0 
08 Jun 2018 03:26:00,590  WARN [ambari-action-scheduler] ExecutionCommandWrapper:225 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
08 Jun 2018 03:26:01,593  WARN [ambari-action-scheduler] ActionScheduler:782 - Host: ip-172-31-18-247.ec2.internal, role: check_host, actionId: 1326-0 expired and will be failed
08 Jun 2018 03:26:01,595  INFO [ambari-action-scheduler] ActionScheduler:809 - Removing command from queue, host=ip-172-31-18-247.ec2.internal, commandId=1326-0 
08 Jun 2018 03:26:01,595  WARN [ambari-action-scheduler] ExecutionCommandWrapper:225 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
08 Jun 2018 03:26:02,077  INFO [qtp-ambari-agent-44] HeartBeatHandler:292 - HeartBeatHandler.sendCommands: sending ExecutionCommand for host ip-172-31-27-147.ec2.internal, role check_host, roleCommand ACTIONEXECUTE, and command ID 1326-0, task ID 12200
08 Jun 2018 03:26:02,599  WARN [ambari-action-scheduler] ActionScheduler:782 - Host: ip-172-31-18-247.ec2.internal, role: check_host, actionId: 1326-0 expired and will be failed
08 Jun 2018 03:26:02,601  INFO [ambari-action-scheduler] ActionScheduler:809 - Removing command from queue, host=ip-172-31-18-247.ec2.internal, commandId=1326-0 
08 Jun 2018 03:26:02,601  WARN [ambari-action-scheduler] ExecutionCommandWrapper:225 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1

76604-stuck.png

3 REPLIES 3

Re: Ambari UI stuck at adding new host

Cloudera Employee

Looks like ambari-server is stuck executing host checks on the host. You can restart ambari-server and ambari-agent with the -debug flag in the command. This will help in nailing down the problem further.

Re: Ambari UI stuck at adding new host

Explorer

Once u have started ambari-server in debug mode.

please check the following in new host agent log.

2018-06-20 17:30:03,018 - IP address forward resolution check started. 

2018-06-20 17:30:03,018 - All hosts resolved to an IP address. 

2018-06-20 17:30:03,018 - IP address forward resolution check completed. 

2018-06-20 17:30:03,019 - Host checks completed. 2018-06-20 17:30:03,019 - Structured output: {'host_resolution_check': {'failed_count': 0, 'exit_code': 0, 'success_count': 21, 'failures': [], 'message': 'All hosts resolved to an IP address.', 'hosts_with_failures': []}} 2018-06-20 17:30:03,019 - Action afix 'post_actionexecute' not present

If it is still stuck at host-check.

the agent log should contain

2018-06-20 17:31:25,502 DEBUG [ambari-client-thread-86] BaseProvider:331 - Skipping property for resource as not in requestedIds, resourceType=Task, propertyId=Tasks/role, value=check_host 
2018-06-20 17:31:25,503 DEBUG [ambari-client-thread-86] BaseProvider:331 - Skipping property for resource as not in requestedIds, resourceType=Task, propertyId=Tasks/command, value=ACTIONEXECUTE 2018-06-20 
17:31:25,503 DEBUG [ambari-client-thread-86] BaseProvider:308 - Setting property for resource, resourceType=Task, propertyId=Tasks/status, value=QUEUED


Please revert what do you see in agent logs??

Re: Ambari UI stuck at adding new host

Explorer

also check the usual stuff like

1. /etc/hosts/,

2. stop iptables,

3.telnet to 8440 from agents,

4. connection established at 8441