- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Cloudbreak not terminating nodes after auto-scale down of cluster
- Labels:
-
Hortonworks Cloudbreak
Created ‎12-04-2017 11:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are having a cloudbreak deployment for managing our HDP clusters where we have enabled auto-scaling of the cluster based on the Ambari metrics. After successful downscaling of the clusters, Cloudbreak fails to terminate/delete the VW's from Azure end and marks the removed host in cloudbreak as 'Unhealthy' node. Is there any workaround for the same?
Any help would be appreciated!
Thanks,
Cibi
Created ‎12-06-2017 06:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose you are using managed disks, if so then this is a known issue which got fixed in Cloudbreak release 1.16.5.
You can try to update following the documentation or you might try to launch Cloudbreak 1.16.5 from Azure Marketplace.
Sorry for the inconvenience & I hope this helps!
Created ‎12-04-2017 02:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is your Cloudbreak version ("cbd version")? Could you attach some logs (cbreak.log file or the output of "cbd logs")?
Created ‎12-06-2017 10:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cloudbreak version is 1.16.4. From the logs it shows the node's Status as decommissioned after the scaling down event. But is not terminated, which needs to be done manually. I couldn't find any events in logs which shows failure on termination of the decommissioned nodes.
Created ‎12-06-2017 01:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Cibi Chakaravarthi There should be some useful information in the logs (there is no sensitive data in it), so please attach it to the case to be able to investigate.
Created ‎12-06-2017 02:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@pdarvasi I was able to find the below ERROR from the CBD log. I can't share the LOG file since there are some sensitive data in it.
/cbreak_cloudbreak_1 | 2017-12-05 18:01:07,913 [http-nio-8080-exec-2] getImage:40 DEBUG c.s.c.s.ComponentConfigProvider - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:cloudbreakLog] [id:undefined] [name:cb] Image found! stackId: 1, component: Component{id=2, componentType=IMAGE, name='IMAGE'} /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,913 [reactorDispatcher-21] accept:48 ERROR c.s.c.c.h.DownscaleStackCollectResourcesHandler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] Failed to handle DownscaleStackCollectResourcesRequest. /cbreak_cloudbreak_1 | com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: can't collect instance resources /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectInstanceResourcesToRemove(AzureResourceConnector.java:293) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectResourcesToRemove(AzureResourceConnector.java:241) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectResourcesToRemove(AzureResourceConnector.java:50) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler.accept(DownscaleStackCollectResourcesHandler.java:43) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler.accept(DownscaleStackCollectResourcesHandler.java:19) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler$FastClassBySpringCGLIB$2b40b706.invoke(<generated>) /cbreak_cloudbreak_1 | at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:738) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:52) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) /cbreak_cloudbreak_1 | at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) /cbreak_cloudbreak_1 | at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:673) /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.handler.DownscaleStackCollectResourcesHandler$EnhancerBySpringCGLIB$7e426fe2.accept(<generated>) /cbreak_cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:317) /cbreak_cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:310) /cbreak_cloudbreak_1 | at reactor.bus.routing.ConsumerFilteringRouter.route(ConsumerFilteringRouter.java:72) /cbreak_cloudbreak_1 | at reactor.bus.routing.TraceableDelegatingRouter.route(TraceableDelegatingRouter.java:51) /cbreak_cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:591) /cbreak_cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:63) /cbreak_cloudbreak_1 | at reactor.core.dispatch.AbstractLifecycleDispatcher.route(AbstractLifecycleDispatcher.java:160) /cbreak_cloudbreak_1 | at reactor.core.dispatch.MultiThreadDispatcher$MultiThreadTask.run(MultiThreadDispatcher.java:74) /cbreak_cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) /cbreak_cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) /cbreak_cloudbreak_1 | at java.lang.Thread.run(Thread.java:745) /cbreak_cloudbreak_1 | Caused by: java.lang.NullPointerException: null /cbreak_cloudbreak_1 | at com.sequenceiq.cloudbreak.cloud.azure.AzureResourceConnector.collectInstanceResourcesToRemove(AzureResourceConnector.java:282) /cbreak_cloudbreak_1 | ... 25 common frames omitted /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,915 [reactorDispatcher-21] accept:52 INFO c.s.c.c.h.DownscaleStackCollectResourcesHandler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] DownscaleStackCollectResourcesRequest finished /cbreak_cloudbreak_1 | 2017-12-05 18:01:07,915 [reactorDispatcher-21] accept:140 DEBUG c.s.c.c.f.Flow2Handler - [owner:d312e73a-f6dc-4e83-9452-bde66b18791f] [type:CLOUDBREAKEVENTDATA] [id:1] [name:cbllapdev30] flow control event arrived: key: DOWNSCALESTACKCOLLECTRESOURCESRESULT_ERROR, flowid: fd790366-8571-479d-98df-f8193447784c, payload: CloudPlatformResult{status=FAILED, statusReason='can't collect instance resources', errorDetails=com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: can't collect instance resources, request=CloudStackRequest{, cloudStack=CloudStack{groups=[com.sequenceiq.cloudbreak.cloud.model.Group@e8cbf0d, com.sequenceiq.cloudbreak.cloud.model.Group@26928c18, com.sequenceiq.cloudbreak.cloud.model.Group@67e1ab4b], network=com.sequenceiq.cloudbreak.cloud.model.Network@206a2f0a, image=Image{imageName='https://sequenceiqwestus2.blob.core.windows.net/images/hdc-hdp--1706211640.vhd', userdata={CORE=#!/bin/bash
Created ‎12-06-2017 06:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose you are using managed disks, if so then this is a known issue which got fixed in Cloudbreak release 1.16.5.
You can try to update following the documentation or you might try to launch Cloudbreak 1.16.5 from Azure Marketplace.
Sorry for the inconvenience & I hope this helps!
Created ‎12-08-2017 06:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the details! But, we are not using Managed disks in our VM instances.
Created ‎12-12-2017 01:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Cibi Chakaravarthi I suggest you to try with the new 1.16.5 version as it has this part of code refactored. The update should not affect your running clusters and it can be run with one command "cbd update".
Hope this helps!
Created ‎12-12-2017 02:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@pdarvasi Yes, i'm trying to test it out with a different cloudbreak deployment. Will let you know how it goes. Thanks for the help!
Created ‎12-21-2017 01:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm trying to upscale the cluster from Cloudbreak and getting the below error:
update failed: New node(s) could not be added to the cluster. Reason com.sequenceiq.cloudbreak.service.CloudbreakServiceException: Ambari could not install services. Invalid Add Hosts Template: org.apache.ambari.server.topology.InvalidTopologyTemplateException: Must specify either host_name or host_count for hostgroup: worker
Error log from Ambari-server:
ERROR [ambari-client-thread-3771] BaseManagementHandler:67 - Bad request received: Invalid Add Hosts Template: org.apache.ambari.server.topology.InvalidTopologyTemplateException: Must specify either host_name or host_count for hostgroup: worker
I also could see the same error details from Ambari server log after enabling DEBUG Log mode.
Cloudbreak Version: 1.16.5
Ambari server version:- 2.6.0.0
Any insight on this issue?
