Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Ambari UI stuck at adding new host

Contributor

Unfortunately I terminated a slave instance in my hcp cluster which was hosting hive server 2, hive metastore and mysql db. In my ambari UI I am getting heart beat lost issue in the services which were in that instance. To fix this I tried adding new host to bring back my services using Host - > Add new Host. I followed below steps for this.

1 - Create new EC2 instance - Cent os 7 - Same as my other instances.

2- Installed yum update & epel repo adding

3- Setup password less authentication from Ambari server to the new Host

4- Filled step 1 parameters - Private key and host ip for the new instance

After step 4 ambari UI is stuck ( Screen catpure - suck.png) and not going to futher step. I checked both ambari-agent & ambari server log but couldn't find any issues. What could be the reason for this ? How can I resolve or futher investigate ?

Ambari agent log :

INFO 2018-06-08 03:30:29,933 Controller.py:304 - Heartbeat (response id = 30) with server is running...
INFO 2018-06-08 03:30:29,933 Controller.py:311 - Building heartbeat message
INFO 2018-06-08 03:30:29,934 Heartbeat.py:90 - Adding host info/state to heartbeat message.
INFO 2018-06-08 03:30:29,989 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2018-06-08 03:30:30,001 Hardware.py:176 - Some mount points were ignored: /, /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/1000, /run/user/0
INFO 2018-06-08 03:30:30,001 Controller.py:320 - Sending Heartbeat (id = 30)
INFO 2018-06-08 03:30:30,003 Controller.py:333 - Heartbeat response received (id = 31)
INFO 2018-06-08 03:30:30,003 Controller.py:342 - Heartbeat interval is 10 seconds
INFO 2018-06-08 03:30:30,003 Controller.py:380 - Updating configurations from heartbeat
INFO 2018-06-08 03:30:30,003 Controller.py:389 - Adding cancel/execution commands
INFO 2018-06-08 03:30:30,003 Controller.py:406 - Adding recovery commands
INFO 2018-06-08 03:30:30,003 Controller.py:475 - Waiting 9.9 for next heartbeat
INFO 2018-06-08 03:30:39,904 Controller.py:482 - Wait for next heartbeat over

Ambari server log :

and will be failed
08 Jun 2018 03:26:00,590  INFO [ambari-action-scheduler] ActionScheduler:809 - Removing command from queue, host=ip-172-31-18-247.ec2.internal, commandId=1326-0 
08 Jun 2018 03:26:00,590  WARN [ambari-action-scheduler] ExecutionCommandWrapper:225 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
08 Jun 2018 03:26:01,593  WARN [ambari-action-scheduler] ActionScheduler:782 - Host: ip-172-31-18-247.ec2.internal, role: check_host, actionId: 1326-0 expired and will be failed
08 Jun 2018 03:26:01,595  INFO [ambari-action-scheduler] ActionScheduler:809 - Removing command from queue, host=ip-172-31-18-247.ec2.internal, commandId=1326-0 
08 Jun 2018 03:26:01,595  WARN [ambari-action-scheduler] ExecutionCommandWrapper:225 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
08 Jun 2018 03:26:02,077  INFO [qtp-ambari-agent-44] HeartBeatHandler:292 - HeartBeatHandler.sendCommands: sending ExecutionCommand for host ip-172-31-27-147.ec2.internal, role check_host, roleCommand ACTIONEXECUTE, and command ID 1326-0, task ID 12200
08 Jun 2018 03:26:02,599  WARN [ambari-action-scheduler] ActionScheduler:782 - Host: ip-172-31-18-247.ec2.internal, role: check_host, actionId: 1326-0 expired and will be failed
08 Jun 2018 03:26:02,601  INFO [ambari-action-scheduler] ActionScheduler:809 - Removing command from queue, host=ip-172-31-18-247.ec2.internal, commandId=1326-0 
08 Jun 2018 03:26:02,601  WARN [ambari-action-scheduler] ExecutionCommandWrapper:225 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1

76604-stuck.png

3 REPLIES 3

Cloudera Employee

Looks like ambari-server is stuck executing host checks on the host. You can restart ambari-server and ambari-agent with the -debug flag in the command. This will help in nailing down the problem further.

Explorer

Once u have started ambari-server in debug mode.

please check the following in new host agent log.

2018-06-20 17:30:03,018 - IP address forward resolution check started. 

2018-06-20 17:30:03,018 - All hosts resolved to an IP address. 

2018-06-20 17:30:03,018 - IP address forward resolution check completed. 

2018-06-20 17:30:03,019 - Host checks completed. 2018-06-20 17:30:03,019 - Structured output: {'host_resolution_check': {'failed_count': 0, 'exit_code': 0, 'success_count': 21, 'failures': [], 'message': 'All hosts resolved to an IP address.', 'hosts_with_failures': []}} 2018-06-20 17:30:03,019 - Action afix 'post_actionexecute' not present

If it is still stuck at host-check.

the agent log should contain

2018-06-20 17:31:25,502 DEBUG [ambari-client-thread-86] BaseProvider:331 - Skipping property for resource as not in requestedIds, resourceType=Task, propertyId=Tasks/role, value=check_host 
2018-06-20 17:31:25,503 DEBUG [ambari-client-thread-86] BaseProvider:331 - Skipping property for resource as not in requestedIds, resourceType=Task, propertyId=Tasks/command, value=ACTIONEXECUTE 2018-06-20 
17:31:25,503 DEBUG [ambari-client-thread-86] BaseProvider:308 - Setting property for resource, resourceType=Task, propertyId=Tasks/status, value=QUEUED


Please revert what do you see in agent logs??

Explorer

also check the usual stuff like

1. /etc/hosts/,

2. stop iptables,

3.telnet to 8440 from agents,

4. connection established at 8441

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.