Support Questions

Find answers, ask questions, and share your expertise

Cloudbreak not terminating nodes after auto-scale down of cluster

avatar
Rising Star

Hi,

We are having a cloudbreak deployment for managing our HDP clusters where we have enabled auto-scaling of the cluster based on the Ambari metrics. After successful downscaling of the clusters, Cloudbreak fails to terminate/delete the VW's from Azure end and marks the removed host in cloudbreak as 'Unhealthy' node. Is there any workaround for the same?

Any help would be appreciated!

Thanks,

Cibi

1 ACCEPTED SOLUTION

avatar
@Cibi Chakaravarthi

I suppose you are using managed disks, if so then this is a known issue which got fixed in Cloudbreak release 1.16.5.

You can try to update following the documentation or you might try to launch Cloudbreak 1.16.5 from Azure Marketplace.

Sorry for the inconvenience & I hope this helps!

View solution in original post

10 REPLIES 10

avatar

@Cibi Chakaravarthi

What is your Cloudbreak version ("cbd version")? Could you attach some logs (cbreak.log file or the output of "cbd logs")?

avatar
Rising Star

Cloudbreak version is 1.16.4. From the logs it shows the node's Status as decommissioned after the scaling down event. But is not terminated, which needs to be done manually. I couldn't find any events in logs which shows failure on termination of the decommissioned nodes.

avatar

@Cibi Chakaravarthi There should be some useful information in the logs (there is no sensitive data in it), so please attach it to the case to be able to investigate.

avatar
Rising Star

@pdarvasi I was able to find the below ERROR from the CBD log. I can't share the LOG file since there are some sensitive data in it.

/cbreak_cloudbreak_1 | 2017-12-05 18:01:07,913 [http-nio-8080-exec-2] getImage:40 DEBUG c.s.c.s.ComponentConfigProvider - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:cloudbreakLog] [id:undefined] [name:cb] Image found! stackId: 1, component: Component{id=2, componentType=IMAGE, name='IMAGE'} /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,913 [reactorDispatcher-21] accept:48 ERROR c.s.c.c.h.DownscaleStackCollectResourcesHandler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] Failed to handle DownscaleStackCollectResourcesRequest. /cbreak_cloudbreak_1 | com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: can't collect instance resources /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectInstanceResourcesToRemove(AzureResourceConnector.java:293) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectResourcesToRemove(AzureResourceConnector.java:241) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectResourcesToRemove(AzureResourceConnector.java:50) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler.accept(DownscaleStackCollectResourcesHandler.java:43) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler.accept(DownscaleStackCollectResourcesHandler.java:19) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler$FastClassBySpringCGLIB$2b40b706.invoke(<generated>) /cbreak_cloudbreak_1 | at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:738) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:52) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) /cbreak_cloudbreak_1 | at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:673) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler$EnhancerBySpringCGLIB$7e426fe2.accept(<generated>) /cbreak_cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:317) /cbreak_cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:310) /cbreak_cloudbreak_1 | at reactor.bus.routing.ConsumerFilteringRouter.route(ConsumerFilteringRouter.java:72) /cbreak_cloudbreak_1 | at reactor.bus.routing.TraceableDelegatingRouter.route(TraceableDelegatingRouter.java:51) /cbreak_cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:591) /cbreak_cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:63) /cbreak_cloudbreak_1 | at reactor.core.dispatch.AbstractLifecycleDispatcher.route(AbstractLifecycleDispatcher.java:160) /cbreak_cloudbreak_1 | at reactor.core.dispatch.MultiThreadDispatcher$MultiThreadTask.run(MultiThreadDispatcher.java:74) /cbreak_cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) /cbreak_cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) /cbreak_cloudbreak_1 | at java.lang.Thread.run(Thread.java:745) /cbreak_cloudbreak_1 | Caused by: java.lang.NullPointerException: null /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectInstanceResourcesToRemove(AzureResourceConnector.java:282) /cbreak_cloudbreak_1 | ... 25 common frames omitted /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,915 [reactorDispatcher-21] accept:52 INFO c.s.c.c.h.DownscaleStackCollectResourcesHandler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] DownscaleStackCollectResourcesRequest finished /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,915 [reactorDispatcher-21] accept:140 DEBUG c.s.c.c.f.Flow2Handler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] flow control event arrived: key: DOWNSCALESTACKCOLLECTRESOURCESRESULT_ERROR, flowid: fd790366-8571-479d-98df-f8193447784c, payload: CloudPlatformResult{status=FAILED, statusReason='can't collect instance resources', errorDetails=com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: can't collect instance resources, request=CloudStackRequest{, cloudStack=CloudStack{groups=[com.sequenceiq.cloudbreak.cloud.model.Group@e8cbf0d, com.sequenceiq.cloudbreak.cloud.model.Group@26928c18, com.sequenceiq.cloudbreak.cloud.model.Group@67e1ab4b], network=com.sequenceiq.cloudbreak.cloud.model.Network@206a2f0a, image=Image{imageName='https://sequenceiqwestus2.blob.core.windows.net/images/hdc-hdp--1706211640.vhd', userdata={CORE=#!/bin/bash

avatar
@Cibi Chakaravarthi

I suppose you are using managed disks, if so then this is a known issue which got fixed in Cloudbreak release 1.16.5.

You can try to update following the documentation or you might try to launch Cloudbreak 1.16.5 from Azure Marketplace.

Sorry for the inconvenience & I hope this helps!

avatar
Rising Star
@pdarvasi

Thanks for the details! But, we are not using Managed disks in our VM instances.

avatar

@Cibi Chakaravarthi I suggest you to try with the new 1.16.5 version as it has this part of code refactored. The update should not affect your running clusters and it can be run with one command "cbd update".

Hope this helps!

avatar
Rising Star

@pdarvasi Yes, i'm trying to test it out with a different cloudbreak deployment. Will let you know how it goes. Thanks for the help!

avatar
Rising Star

Hi,

I'm trying to upscale the cluster from Cloudbreak and getting the below error:

update failed: New node(s) could not be added to the cluster. Reason com.sequenceiq.cloudbreak.service.CloudbreakServiceException: Ambari could not install services. Invalid Add Hosts Template: org.apache.ambari.server.topology.InvalidTopologyTemplateException: Must specify either host_name or host_count for hostgroup: worker

Error log from Ambari-server:

ERROR [ambari-client-thread-3771] BaseManagementHandler:67 - Bad request received: Invalid Add Hosts Template: org.apache.ambari.server.topology.InvalidTopologyTemplateException: Must specify either host_name or host_count for hostgroup: worker

I also could see the same error details from Ambari server log after enabling DEBUG Log mode.

Cloudbreak Version: 1.16.5

Ambari server version:- 2.6.0.0

Any insight on this issue?