Created 12-04-2017 11:52 AM
Hi,
We are having a cloudbreak deployment for managing our HDP clusters where we have enabled auto-scaling of the cluster based on the Ambari metrics. After successful downscaling of the clusters, Cloudbreak fails to terminate/delete the VW's from Azure end and marks the removed host in cloudbreak as 'Unhealthy' node. Is there any workaround for the same?
Any help would be appreciated!
Thanks,
Cibi
Created 12-06-2017 06:22 PM
I suppose you are using managed disks, if so then this is a known issue which got fixed in Cloudbreak release 1.16.5.
You can try to update following the documentation or you might try to launch Cloudbreak 1.16.5 from Azure Marketplace.
Sorry for the inconvenience & I hope this helps!
Created 12-04-2017 02:42 PM
What is your Cloudbreak version ("cbd version")? Could you attach some logs (cbreak.log file or the output of "cbd logs")?
Created 12-06-2017 10:13 AM
Cloudbreak version is 1.16.4. From the logs it shows the node's Status as decommissioned after the scaling down event. But is not terminated, which needs to be done manually. I couldn't find any events in logs which shows failure on termination of the decommissioned nodes.
Created 12-06-2017 01:00 PM
@Cibi Chakaravarthi There should be some useful information in the logs (there is no sensitive data in it), so please attach it to the case to be able to investigate.
Created 12-06-2017 02:44 PM
@pdarvasi I was able to find the below ERROR from the CBD log. I can't share the LOG file since there are some sensitive data in it.
/cbreak_cloudbreak_1 | 2017-12-05 18:01:07,913 [http-nio-8080-exec-2] getImage:40 DEBUG c.s.c.s.ComponentConfigProvider - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:cloudbreakLog] [id:undefined] [name:cb] Image found! stackId: 1, component: Component{id=2, componentType=IMAGE, name='IMAGE'} /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,913 [reactorDispatcher-21] accept:48 ERROR c.s.c.c.h.DownscaleStackCollectResourcesHandler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] Failed to handle DownscaleStackCollectResourcesRequest. /cbreak_cloudbreak_1 | com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: can't collect instance resources /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectInstanceResourcesToRemove(AzureResourceConnector.java:293) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectResourcesToRemove(AzureResourceConnector.java:241) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectResourcesToRemove(AzureResourceConnector.java:50) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler.accept(DownscaleStackCollectResourcesHandler.java:43) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler.accept(DownscaleStackCollectResourcesHandler.java:19) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler$FastClassBySpringCGLIB$2b40b706.invoke(<generated>) /cbreak_cloudbreak_1 | at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:738) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:52) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) /cbreak_cloudbreak_1 | at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:673) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler$EnhancerBySpringCGLIB$7e426fe2.accept(<generated>) /cbreak_cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:317) /cbreak_cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:310) /cbreak_cloudbreak_1 | at reactor.bus.routing.ConsumerFilteringRouter.route(ConsumerFilteringRouter.java:72) /cbreak_cloudbreak_1 | at reactor.bus.routing.TraceableDelegatingRouter.route(TraceableDelegatingRouter.java:51) /cbreak_cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:591) /cbreak_cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:63) /cbreak_cloudbreak_1 | at reactor.core.dispatch.AbstractLifecycleDispatcher.route(AbstractLifecycleDispatcher.java:160) /cbreak_cloudbreak_1 | at reactor.core.dispatch.MultiThreadDispatcher$MultiThreadTask.run(MultiThreadDispatcher.java:74) /cbreak_cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) /cbreak_cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) /cbreak_cloudbreak_1 | at java.lang.Thread.run(Thread.java:745) /cbreak_cloudbreak_1 | Caused by: java.lang.NullPointerException: null /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectInstanceResourcesToRemove(AzureResourceConnector.java:282) /cbreak_cloudbreak_1 | ... 25 common frames omitted /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,915 [reactorDispatcher-21] accept:52 INFO c.s.c.c.h.DownscaleStackCollectResourcesHandler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] DownscaleStackCollectResourcesRequest finished /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,915 [reactorDispatcher-21] accept:140 DEBUG c.s.c.c.f.Flow2Handler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] flow control event arrived: key: DOWNSCALESTACKCOLLECTRESOURCESRESULT_ERROR, flowid: fd790366-8571-479d-98df-f8193447784c, payload: CloudPlatformResult{status=FAILED, statusReason='can't collect instance resources', errorDetails=com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: can't collect instance resources, request=CloudStackRequest{, cloudStack=CloudStack{groups=[com.sequenceiq.cloudbreak.cloud.model.Group@e8cbf0d, com.sequenceiq.cloudbreak.cloud.model.Group@26928c18, com.sequenceiq.cloudbreak.cloud.model.Group@67e1ab4b], network=com.sequenceiq.cloudbreak.cloud.model.Network@206a2f0a, image=Image{imageName='https://sequenceiqwestus2.blob.core.windows.net/images/hdc-hdp--1706211640.vhd', userdata={CORE=#!/bin/bash
Created 12-06-2017 06:22 PM
I suppose you are using managed disks, if so then this is a known issue which got fixed in Cloudbreak release 1.16.5.
You can try to update following the documentation or you might try to launch Cloudbreak 1.16.5 from Azure Marketplace.
Sorry for the inconvenience & I hope this helps!
Created 12-08-2017 06:04 PM
Thanks for the details! But, we are not using Managed disks in our VM instances.
Created 12-12-2017 01:52 PM
@Cibi Chakaravarthi I suggest you to try with the new 1.16.5 version as it has this part of code refactored. The update should not affect your running clusters and it can be run with one command "cbd update".
Hope this helps!
Created 12-12-2017 02:08 PM
@pdarvasi Yes, i'm trying to test it out with a different cloudbreak deployment. Will let you know how it goes. Thanks for the help!
Created 12-21-2017 01:06 PM
Hi,
I'm trying to upscale the cluster from Cloudbreak and getting the below error:
update failed: New node(s) could not be added to the cluster. Reason com.sequenceiq.cloudbreak.service.CloudbreakServiceException: Ambari could not install services. Invalid Add Hosts Template: org.apache.ambari.server.topology.InvalidTopologyTemplateException: Must specify either host_name or host_count for hostgroup: worker
Error log from Ambari-server:
ERROR [ambari-client-thread-3771] BaseManagementHandler:67 - Bad request received: Invalid Add Hosts Template: org.apache.ambari.server.topology.InvalidTopologyTemplateException: Must specify either host_name or host_count for hostgroup: worker
I also could see the same error details from Ambari server log after enabling DEBUG Log mode.
Cloudbreak Version: 1.16.5
Ambari server version:- 2.6.0.0
Any insight on this issue?