Support Questions

fillot_nicolas · ‎02-16-2017

Hello,

I am trying to add a new node to a hostgroup with Cloudbreak 1.6.2 but the UPSCALE_REQUEST is getting stuck :

2017-02-16 13:03:52,193 [reactorDispatcher-82] checkStatus:42 INFO c.s.c.s.c.f.AmbariOperationsStatusCheckerTask - [owner:19d4a33c-71e5-42e7-a787-d8487e47361b] [type:STACK] [id:1] [name:mycluster] Ambari operation: 'UPSCALE_REQUEST', Progress: 0.0

In ambari-server.log, i am getting the following :

16 Feb 2017 10:41:35,176  INFO [qtp-ambari-agent-106] HostImpl:294 - Received host registration, host=[hostname=ip-10-0-0-1,fqdn=ip-10-0-0-1.us-west-2.compute.internal,domain=us-west-2.compute.internal,architecture=x86_64,processorcount=8,physicalprocessorcount=8,osname=amazon,osversion=6.03,osfamily=redhat,memory=15403992,uptime_hours=0,mounts=(available=49480740,mountpoint=/,used=1892012,percent=4%,size=51473000,device=/dev/xvda1,type=ext4)(available=7691084,mountpoint=/dev,used=64,percent=1%,size=7691148,device=devtmpfs,type=devtmpfs)(available=7701984,mountpoint=/dev/shm,used=12,percent=1%,size=7701996,device=tmpfs,type=tmpfs)(available=979473800,mountpoint=/hadoopfs/fs1,used=73080,percent=1%,size=1031992064,device=/dev/xvdb,type=ext4)]
16 Feb 2017 10:41:35,202  INFO [qtp-ambari-agent-106] TopologyManager:469 - TopologyManager: Queueing available host ip-10-0-0-1.us-west-2.compute.internal
16 Feb 2017 10:42:06,694  INFO [ambari-client-thread-26] ClusterTopologyImpl:166 - ClusterTopologyImpl.addHostTopology: added host = ip-10-0-0-1.us-west-2.compute.internal to host group = host_group_1
16 Feb 2017 10:42:06,701  INFO [ambari-client-thread-26] HostRequest:93 - HostRequest: Created request for host: ip-10-0-0-1.us-west-2.compute.internal
16 Feb 2017 10:42:06,735  INFO [ambari-client-thread-26] TopologyManager:618 - TopologyManager.processRequest: host name = ip-10-0-0-1.us-west-2.compute.internal is mapped to LogicalRequest ID = 40 and will be removed from the reserved hosts.
16 Feb 2017 10:42:06,735  INFO [ambari-client-thread-26] TopologyManager:631 - TopologyManager.processRequest: offering host name = ip-10-0-0-1.us-west-2.compute.internal to LogicalRequest ID = 40
16 Feb 2017 10:42:06,736  INFO [ambari-client-thread-26] LogicalRequest:100 - LogicalRequest.offer: attempting to match a request to a request for a reserved host to hostname = ip-10-0-0-1.us-west-2.compute.internal
16 Feb 2017 10:42:06,736  INFO [ambari-client-thread-26] LogicalRequest:109 - LogicalRequest.offer: request mapping ACCEPTED for host = ip-10-0-0-1.us-west-2.compute.internal
16 Feb 2017 10:42:06,737  INFO [ambari-client-thread-26] TopologyManager:641 - TopologyManager.processRequest: host name = ip-10-0-0-1.us-west-2.compute.internal was ACCEPTED by LogicalRequest ID = 40 , host has been removed from available hosts.
16 Feb 2017 10:42:06,738  INFO [ambari-client-thread-26] ClusterTopologyImpl:166 - ClusterTopologyImpl.addHostTopology: added host = ip-10-0-0-1.us-west-2.compute.internal to host group = host_group_1
16 Feb 2017 10:42:06,749  INFO [ambari-client-thread-26] TopologyManager:726 - TopologyManager.processAcceptedHostOffer: about to execute tasks for host = ip-10-0-0-1.us-west-2.compute.internal
16 Feb 2017 10:42:06,749  INFO [ambari-client-thread-26] TopologyManager:730 - Processing accepted host offer for ip-10-0-0-1.us-west-2.compute.internal which responded ACCEPTED and task RESOURCE_CREATION
16 Feb 2017 10:42:06,750  INFO [ambari-client-thread-26] TopologyManager:730 - Processing accepted host offer for ip-10-0-0-1.us-west-2.compute.internal which responded ACCEPTED and task CONFIGURE
16 Feb 2017 10:42:06,751  INFO [ambari-client-thread-26] TopologyManager:730 - Processing accepted host offer for ip-10-0-0-1.us-west-2.compute.internal which responded ACCEPTED and task INSTALL
16 Feb 2017 10:42:06,751  INFO [ambari-client-thread-26] TopologyManager:730 - Processing accepted host offer for ip-10-0-0-1.us-west-2.compute.internal which responded ACCEPTED and task START
16 Feb 2017 10:42:06,892  INFO [pool-3-thread-2] HostRequest:509 - HostRequest.InstallHostTask: Executing INSTALL task for host: ip-10-0-0-1.us-west-2.compute.internal
16 Feb 2017 10:42:06,897  INFO [pool-3-thread-2] AbstractResourceProvider:357 - Installing all components on host: ip-10-0-0-1.us-west-2.compute.internal
16 Feb 2017 10:42:06,902  INFO [pool-3-thread-2] AbstractResourceProvider:787 - Skipping updating hosts: no matching requests for (HostRoles/state=INIT AND HostRoles/host_name=ip-10-0-0-1.us-west-2.compute.internal) AND HostRoles/cluster_name=mycluster
16 Feb 2017 10:42:06,904  INFO [pool-3-thread-3] HostRequest:559 - HostRequest.StartHostTask: Executing START task for host: ip-10-0-0-1.us-west-2.compute.internal
16 Feb 2017 10:42:06,907  INFO [pool-3-thread-3] AbstractResourceProvider:412 - Starting all non-client components on host: ip-10-0-0-1.us-west-2.compute.internal
16 Feb 2017 10:42:06,910  INFO [pool-3-thread-3] AbstractResourceProvider:787 - Skipping updating hosts: no matching requests for (HostRoles/cluster_name=mycluster AND NOT(org.apache.ambari.server.controller.internal.HostComponentResourceProvider$ClientComponentPredicate@31161890)) AND (HostRoles/desired_state=INSTALLED AND HostRoles/host_name=ip-10-0-0-1.us-west-2.compute.internal)

The " Skipping updating hosts: no matching requests for (HostRoles/state=INIT AND HostRoles/host_name=ip-10-0-0-1.us-west-2.compute.internal) AND HostRoles/cluster_name=mycluster" line bothers me here.

I can see the host in Ambari with 0 components on it ( but i can install components on it manually)

I am only able to stop the upscale request by deleting rows inside Postgresql (ambari.topology_logical_request, ambari.topology_hostgroup; ambari.topology_request) and restarting the Ambari server

PS : I upgraded from Cloudbreak 1.6.0 to 1.6.2.

Any guidance or idea would be greatly appreciated. Thank you.

fillot_nicolas · ‎02-21-2017

@pdarvasi Your question made me realise the mistake.

We deleted Tez client inside Ambari directly and forgot that the blueprint references Tez, hence the DECLINED_PREDICATE on each host.

Reinstalling Tez on hosts through Ambari fixed the issue

Thanks 🙂

View solution in original post

fillot_nicolas · ‎02-21-2017

Forgot some logs regarding DECLINED_PREDICATE on other hosts of the host group:

20 Feb 2017 15:14:47,211  INFO [ambari-client-thread-28] TopologyManager:631 - TopologyManager.processRequest: offering host name = ip-10-0-0-2.us-west-2.compute.internal to LogicalRequest ID = 42
20 Feb 2017 15:14:47,212  INFO [ambari-client-thread-28] LogicalRequest:100 - LogicalRequest.offer: attempting to match a request to a request for a reserved host to hostname = ip-10-0-0-2.us-west-2.compute.internal
20 Feb 2017 15:14:47,212  INFO [ambari-client-thread-28] LogicalRequest:141 - LogicalRequest.offer: outstandingHost request list size = 0
20 Feb 2017 15:14:47,212  INFO [ambari-client-thread-28] TopologyManager:651 - TopologyManager.processRequest: host name = ip-10-0-0-2.us-west-2.compute.internal was DECLINED_PREDICATE by LogicalRequest ID = 42
20 Feb 2017 15:14:47,217  INFO [ambari-client-thread-28] TopologyManager:631 - TopologyManager.processRequest: offering host name = ip-10-0-0-3.us-west-2.compute.internal to LogicalRequest ID = 42
20 Feb 2017 15:14:47,217  INFO [ambari-client-thread-28] LogicalRequest:100 - LogicalRequest.offer: attempting to match a request to a request for a reserved host to hostname = ip-10-0-0-3.us-west-2.compute.internal
20 Feb 2017 15:14:47,218  INFO [ambari-client-thread-28] LogicalRequest:141 - LogicalRequest.offer: outstandingHost request list size = 0
20 Feb 2017 15:14:47,218  INFO [ambari-client-thread-28] TopologyManager:651 - TopologyManager.processRequest: host name = ip-10-0-0-3.us-east-1.compute.internal was DECLINED_PREDICATE by LogicalRequest ID = 42

darvasip · ‎02-21-2017

Hi Nicolas, could you pls. attach Cloudbreak logs? How was the original cluster created? Was it manually managed somehow (e.g adding/removing service) before upscale?

fillot_nicolas · ‎02-21-2017

@pdarvasi Your question made me realise the mistake.

We deleted Tez client inside Ambari directly and forgot that the blueprint references Tez, hence the DECLINED_PREDICATE on each host.

Reinstalling Tez on hosts through Ambari fixed the issue

Thanks 🙂

darvasip · ‎02-21-2017

I'm glad it is resolved, the issue which caused this behavior was fixed in Ambari some weeks ago.

Cloudera Community

Support Questions

UPSCALE_REQUEST stuck at 0%