Created 02-16-2017 01:32 PM
Hello,
I am trying to add a new node to a hostgroup with Cloudbreak 1.6.2 but the UPSCALE_REQUEST is getting stuck :
2017-02-16 13:03:52,193 [reactorDispatcher-82] checkStatus:42 INFO c.s.c.s.c.f.AmbariOperationsStatusCheckerTask - [owner:19d4a33c-71e5-42e7-a787-d8487e47361b] [type:STACK] [id:1] [name:mycluster] Ambari operation: 'UPSCALE_REQUEST', Progress: 0.0
In ambari-server.log, i am getting the following :
16 Feb 2017 10:41:35,176 INFO [qtp-ambari-agent-106] HostImpl:294 - Received host registration, host=[hostname=ip-10-0-0-1,fqdn=ip-10-0-0-1.us-west-2.compute.internal,domain=us-west-2.compute.internal,architecture=x86_64,processorcount=8,physicalprocessorcount=8,osname=amazon,osversion=6.03,osfamily=redhat,memory=15403992,uptime_hours=0,mounts=(available=49480740,mountpoint=/,used=1892012,percent=4%,size=51473000,device=/dev/xvda1,type=ext4)(available=7691084,mountpoint=/dev,used=64,percent=1%,size=7691148,device=devtmpfs,type=devtmpfs)(available=7701984,mountpoint=/dev/shm,used=12,percent=1%,size=7701996,device=tmpfs,type=tmpfs)(available=979473800,mountpoint=/hadoopfs/fs1,used=73080,percent=1%,size=1031992064,device=/dev/xvdb,type=ext4)] 16 Feb 2017 10:41:35,202 INFO [qtp-ambari-agent-106] TopologyManager:469 - TopologyManager: Queueing available host ip-10-0-0-1.us-west-2.compute.internal 16 Feb 2017 10:42:06,694 INFO [ambari-client-thread-26] ClusterTopologyImpl:166 - ClusterTopologyImpl.addHostTopology: added host = ip-10-0-0-1.us-west-2.compute.internal to host group = host_group_1 16 Feb 2017 10:42:06,701 INFO [ambari-client-thread-26] HostRequest:93 - HostRequest: Created request for host: ip-10-0-0-1.us-west-2.compute.internal 16 Feb 2017 10:42:06,735 INFO [ambari-client-thread-26] TopologyManager:618 - TopologyManager.processRequest: host name = ip-10-0-0-1.us-west-2.compute.internal is mapped to LogicalRequest ID = 40 and will be removed from the reserved hosts. 16 Feb 2017 10:42:06,735 INFO [ambari-client-thread-26] TopologyManager:631 - TopologyManager.processRequest: offering host name = ip-10-0-0-1.us-west-2.compute.internal to LogicalRequest ID = 40 16 Feb 2017 10:42:06,736 INFO [ambari-client-thread-26] LogicalRequest:100 - LogicalRequest.offer: attempting to match a request to a request for a reserved host to hostname = ip-10-0-0-1.us-west-2.compute.internal 16 Feb 2017 10:42:06,736 INFO [ambari-client-thread-26] LogicalRequest:109 - LogicalRequest.offer: request mapping ACCEPTED for host = ip-10-0-0-1.us-west-2.compute.internal 16 Feb 2017 10:42:06,737 INFO [ambari-client-thread-26] TopologyManager:641 - TopologyManager.processRequest: host name = ip-10-0-0-1.us-west-2.compute.internal was ACCEPTED by LogicalRequest ID = 40 , host has been removed from available hosts. 16 Feb 2017 10:42:06,738 INFO [ambari-client-thread-26] ClusterTopologyImpl:166 - ClusterTopologyImpl.addHostTopology: added host = ip-10-0-0-1.us-west-2.compute.internal to host group = host_group_1 16 Feb 2017 10:42:06,749 INFO [ambari-client-thread-26] TopologyManager:726 - TopologyManager.processAcceptedHostOffer: about to execute tasks for host = ip-10-0-0-1.us-west-2.compute.internal 16 Feb 2017 10:42:06,749 INFO [ambari-client-thread-26] TopologyManager:730 - Processing accepted host offer for ip-10-0-0-1.us-west-2.compute.internal which responded ACCEPTED and task RESOURCE_CREATION 16 Feb 2017 10:42:06,750 INFO [ambari-client-thread-26] TopologyManager:730 - Processing accepted host offer for ip-10-0-0-1.us-west-2.compute.internal which responded ACCEPTED and task CONFIGURE 16 Feb 2017 10:42:06,751 INFO [ambari-client-thread-26] TopologyManager:730 - Processing accepted host offer for ip-10-0-0-1.us-west-2.compute.internal which responded ACCEPTED and task INSTALL 16 Feb 2017 10:42:06,751 INFO [ambari-client-thread-26] TopologyManager:730 - Processing accepted host offer for ip-10-0-0-1.us-west-2.compute.internal which responded ACCEPTED and task START 16 Feb 2017 10:42:06,892 INFO [pool-3-thread-2] HostRequest:509 - HostRequest.InstallHostTask: Executing INSTALL task for host: ip-10-0-0-1.us-west-2.compute.internal 16 Feb 2017 10:42:06,897 INFO [pool-3-thread-2] AbstractResourceProvider:357 - Installing all components on host: ip-10-0-0-1.us-west-2.compute.internal 16 Feb 2017 10:42:06,902 INFO [pool-3-thread-2] AbstractResourceProvider:787 - Skipping updating hosts: no matching requests for (HostRoles/state=INIT AND HostRoles/host_name=ip-10-0-0-1.us-west-2.compute.internal) AND HostRoles/cluster_name=mycluster 16 Feb 2017 10:42:06,904 INFO [pool-3-thread-3] HostRequest:559 - HostRequest.StartHostTask: Executing START task for host: ip-10-0-0-1.us-west-2.compute.internal 16 Feb 2017 10:42:06,907 INFO [pool-3-thread-3] AbstractResourceProvider:412 - Starting all non-client components on host: ip-10-0-0-1.us-west-2.compute.internal 16 Feb 2017 10:42:06,910 INFO [pool-3-thread-3] AbstractResourceProvider:787 - Skipping updating hosts: no matching requests for (HostRoles/cluster_name=mycluster AND NOT(org.apache.ambari.server.controller.internal.HostComponentResourceProvider$ClientComponentPredicate@31161890)) AND (HostRoles/desired_state=INSTALLED AND HostRoles/host_name=ip-10-0-0-1.us-west-2.compute.internal)
The " Skipping updating hosts: no matching requests for (HostRoles/state=INIT AND HostRoles/host_name=ip-10-0-0-1.us-west-2.compute.internal) AND HostRoles/cluster_name=mycluster" line bothers me here.
I can see the host in Ambari with 0 components on it ( but i can install components on it manually)
I am only able to stop the upscale request by deleting rows inside Postgresql (ambari.topology_logical_request, ambari.topology_hostgroup; ambari.topology_request) and restarting the Ambari server
PS : I upgraded from Cloudbreak 1.6.0 to 1.6.2.
Any guidance or idea would be greatly appreciated. Thank you.
Created 02-21-2017 03:59 PM
@pdarvasi Your question made me realise the mistake.
We deleted Tez client inside Ambari directly and forgot that the blueprint references Tez, hence the DECLINED_PREDICATE on each host.
Reinstalling Tez on hosts through Ambari fixed the issue
Thanks 🙂
Created 02-21-2017 09:12 AM
Forgot some logs regarding DECLINED_PREDICATE on other hosts of the host group:
20 Feb 2017 15:14:47,211 INFO [ambari-client-thread-28] TopologyManager:631 - TopologyManager.processRequest: offering host name = ip-10-0-0-2.us-west-2.compute.internal to LogicalRequest ID = 42 20 Feb 2017 15:14:47,212 INFO [ambari-client-thread-28] LogicalRequest:100 - LogicalRequest.offer: attempting to match a request to a request for a reserved host to hostname = ip-10-0-0-2.us-west-2.compute.internal 20 Feb 2017 15:14:47,212 INFO [ambari-client-thread-28] LogicalRequest:141 - LogicalRequest.offer: outstandingHost request list size = 0 20 Feb 2017 15:14:47,212 INFO [ambari-client-thread-28] TopologyManager:651 - TopologyManager.processRequest: host name = ip-10-0-0-2.us-west-2.compute.internal was DECLINED_PREDICATE by LogicalRequest ID = 42 20 Feb 2017 15:14:47,217 INFO [ambari-client-thread-28] TopologyManager:631 - TopologyManager.processRequest: offering host name = ip-10-0-0-3.us-west-2.compute.internal to LogicalRequest ID = 42 20 Feb 2017 15:14:47,217 INFO [ambari-client-thread-28] LogicalRequest:100 - LogicalRequest.offer: attempting to match a request to a request for a reserved host to hostname = ip-10-0-0-3.us-west-2.compute.internal 20 Feb 2017 15:14:47,218 INFO [ambari-client-thread-28] LogicalRequest:141 - LogicalRequest.offer: outstandingHost request list size = 0 20 Feb 2017 15:14:47,218 INFO [ambari-client-thread-28] TopologyManager:651 - TopologyManager.processRequest: host name = ip-10-0-0-3.us-east-1.compute.internal was DECLINED_PREDICATE by LogicalRequest ID = 42
Created 02-21-2017 02:32 PM
Hi Nicolas, could you pls. attach Cloudbreak logs? How was the original cluster created? Was it manually managed somehow (e.g adding/removing service) before upscale?
Created 02-21-2017 03:59 PM
@pdarvasi Your question made me realise the mistake.
We deleted Tez client inside Ambari directly and forgot that the blueprint references Tez, hence the DECLINED_PREDICATE on each host.
Reinstalling Tez on hosts through Ambari fixed the issue
Thanks 🙂
Created 02-21-2017 04:07 PM
I'm glad it is resolved, the issue which caused this behavior was fixed in Ambari some weeks ago.