Support Questions

Find answers, ask questions, and share your expertise

Cloudbreak not terminating nodes after auto-scale down of cluster

Explorer

Hi,

We are having a cloudbreak deployment for managing our HDP clusters where we have enabled auto-scaling of the cluster based on the Ambari metrics. After successful downscaling of the clusters, Cloudbreak fails to terminate/delete the VW's from Azure end and marks the removed host in cloudbreak as 'Unhealthy' node. Is there any workaround for the same?

Any help would be appreciated!

Thanks,

Cibi

1 ACCEPTED SOLUTION

@Cibi Chakaravarthi

I suppose you are using managed disks, if so then this is a known issue which got fixed in Cloudbreak release 1.16.5.

You can try to update following the documentation or you might try to launch Cloudbreak 1.16.5 from Azure Marketplace.

Sorry for the inconvenience & I hope this helps!

View solution in original post

10 REPLIES 10

@Cibi Chakaravarthi

What is your Cloudbreak version ("cbd version")? Could you attach some logs (cbreak.log file or the output of "cbd logs")?

Explorer

Cloudbreak version is 1.16.4. From the logs it shows the node's Status as decommissioned after the scaling down event. But is not terminated, which needs to be done manually. I couldn't find any events in logs which shows failure on termination of the decommissioned nodes.

@Cibi Chakaravarthi There should be some useful information in the logs (there is no sensitive data in it), so please attach it to the case to be able to investigate.

Explorer

@pdarvasi I was able to find the below ERROR from the CBD log. I can't share the LOG file since there are some sensitive data in it.

/cbreak_cloudbreak_1 | 2017-12-05 18:01:07,913 [http-nio-8080-exec-2] getImage:40 DEBUG c.s.c.s.ComponentConfigProvider - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:cloudbreakLog] [id:undefined] [name:cb] Image found! stackId: 1, component: Component{id=2, componentType=IMAGE, name='IMAGE'} /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,913 [reactorDispatcher-21] accept:48 ERROR c.s.c.c.h.DownscaleStackCollectResourcesHandler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] Failed to handle DownscaleStackCollectResourcesRequest. /cbreak_cloudbreak_1 | com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: can't collect instance resources /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectInstanceResourcesToRemove(AzureResourceConnector.java:293) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectResourcesToRemove(AzureResourceConnector.java:241) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectResourcesToRemove(AzureResourceConnector.java:50) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler.accept(DownscaleStackCollectResourcesHandler.java:43) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler.accept(DownscaleStackCollectResourcesHandler.java:19) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler$FastClassBySpringCGLIB$2b40b706.invoke(<generated>) /cbreak_cloudbreak_1 | at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:738) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:52) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) /cbreak_cloudbreak_1 | at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:673) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler$EnhancerBySpringCGLIB$7e426fe2.accept(<generated>) /cbreak_cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:317) /cbreak_cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:310) /cbreak_cloudbreak_1 | at reactor.bus.routing.ConsumerFilteringRouter.route(ConsumerFilteringRouter.java:72) /cbreak_cloudbreak_1 | at reactor.bus.routing.TraceableDelegatingRouter.route(TraceableDelegatingRouter.java:51) /cbreak_cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:591) /cbreak_cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:63) /cbreak_cloudbreak_1 | at reactor.core.dispatch.AbstractLifecycleDispatcher.route(AbstractLifecycleDispatcher.java:160) /cbreak_cloudbreak_1 | at reactor.core.dispatch.MultiThreadDispatcher$MultiThreadTask.run(MultiThreadDispatcher.java:74) /cbreak_cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) /cbreak_cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) /cbreak_cloudbreak_1 | at java.lang.Thread.run(Thread.java:745) /cbreak_cloudbreak_1 | Caused by: java.lang.NullPointerException: null /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectInstanceResourcesToRemove(AzureResourceConnector.java:282) /cbreak_cloudbreak_1 | ... 25 common frames omitted /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,915 [reactorDispatcher-21] accept:52 INFO c.s.c.c.h.DownscaleStackCollectResourcesHandler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] DownscaleStackCollectResourcesRequest finished /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,915 [reactorDispatcher-21] accept:140 DEBUG c.s.c.c.f.Flow2Handler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] flow control event arrived: key: DOWNSCALESTACKCOLLECTRESOURCESRESULT_ERROR, flowid: fd790366-8571-479d-98df-f8193447784c, payload: CloudPlatformResult{status=FAILED, statusReason='can't collect instance resources', errorDetails=com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: can't collect instance resources, request=CloudStackRequest{, cloudStack=CloudStack{groups=[com.sequenceiq.cloudbreak.cloud.model.Group@e8cbf0d, com.sequenceiq.cloudbreak.cloud.model.Group@26928c18, com.sequenceiq.cloudbreak.cloud.model.Group@67e1ab4b], network=com.sequenceiq.cloudbreak.cloud.model.Network@206a2f0a, image=Image{imageName='https://sequenceiqwestus2.blob.core.windows.net/images/hdc-hdp--1706211640.vhd', userdata={CORE=#!/bin/bash

@Cibi Chakaravarthi

I suppose you are using managed disks, if so then this is a known issue which got fixed in Cloudbreak release 1.16.5.

You can try to update following the documentation or you might try to launch Cloudbreak 1.16.5 from Azure Marketplace.

Sorry for the inconvenience & I hope this helps!

Explorer
@pdarvasi

Thanks for the details! But, we are not using Managed disks in our VM instances.

@Cibi Chakaravarthi I suggest you to try with the new 1.16.5 version as it has this part of code refactored. The update should not affect your running clusters and it can be run with one command "cbd update".

Hope this helps!

Explorer

@pdarvasi Yes, i'm trying to test it out with a different cloudbreak deployment. Will let you know how it goes. Thanks for the help!

Explorer

Hi,

I'm trying to upscale the cluster from Cloudbreak and getting the below error:

update failed: New node(s) could not be added to the cluster. Reason com.sequenceiq.cloudbreak.service.CloudbreakServiceException: Ambari could not install services. Invalid Add Hosts Template: org.apache.ambari.server.topology.InvalidTopologyTemplateException: Must specify either host_name or host_count for hostgroup: worker

Error log from Ambari-server:

ERROR [ambari-client-thread-3771] BaseManagementHandler:67 - Bad request received: Invalid Add Hosts Template: org.apache.ambari.server.topology.InvalidTopologyTemplateException: Must specify either host_name or host_count for hostgroup: worker

I also could see the same error details from Ambari server log after enabling DEBUG Log mode.

Cloudbreak Version: 1.16.5

Ambari server version:- 2.6.0.0

Any insight on this issue?

@Cibi Chakaravarthi If your original question was answered, would you please consider accepting the answer? Thanks!

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.