Created 06-27-2018 02:35 PM
Hi,
Yesterday I've upgraded CBD from 2.6 to 2.7 and I'm getting a lot of issues.
Weird, given 2.7 is in GA and 2.6 is not.
I'm trying to deploy my HDP cluster to a subnet, which is using Azure AD Domain Services DNS.
On deployment I disabled public IPs as well.
This is what I'm getting from logs, when deploying default blueprint with default options onto my private subnet:
cloudbreak_1 | 2018-06-27 14:34:12,803 [containerBootstrapBuilderExecutor-18] doCall:85 INFO c.s.c.o.OrchestratorBootstrapRunner - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] Calling orchestrator bootstrap: Salt, additional info: SaltBootstrap{sc=com.sequenceiq.cloudbreak.orchestrator.salt.client.SaltConnector@3172c1ff, allGatewayConfigs=[GatewayConfig{connectionAddress='10.251.3.69', publicAddress='10.251.3.69', privateAddress='10.251.3.69', hostname='null', gatewayPort=9443, knoxGatewayEnabled=true, primary=true}], originalTargets=[Node{privateIp='10.251.3.69', publicIp='10.251.3.69', hostname='', domain='null', hostGroup='master', dataVolumes=null}, Node{privateIp='10.251.3.68', publicIp='10.251.3.68', hostname='', domain='null', hostGroup='worker', dataVolumes=null}], targets=[Node{privateIp='10.251.3.69', publicIp='10.251.3.69', hostname='', domain='null', hostGroup='master', dataVolumes=null}, Node{privateIp='10.251.3.68', publicIp='10.251.3.68', hostname='', domain='null', hostGroup='worker', dataVolumes=null}]} cloudbreak_1 | 2018-06-27 14:34:12,805 [containerBootstrapBuilderExecutor-18] call:55 INFO c.s.c.o.s.p.SaltBootstrap - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] Bootstrapping of nodes [0/2] cloudbreak_1 | 2018-06-27 14:34:12,806 [containerBootstrapBuilderExecutor-18] call:57 INFO c.s.c.o.s.p.SaltBootstrap - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] Missing targets for SaltBootstrap: [Node{privateIp='10.251.3.69', publicIp='10.251.3.69', hostname='', domain='null', hostGroup='master', dataVolumes=null}, Node{privateIp='10.251.3.68', publicIp='10.251.3.68', hostname='', domain='null', hostGroup='worker', dataVolumes=null}] cloudbreak_1 | 2018-06-27 14:34:12,827 [containerBootstrapBuilderExecutor-18] lambda$hostnameVerifier$0:28 INFO c.s.c.c.CertificateTrustManager - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] verify hostname: 10.251.3.69 cloudbreak_1 | 2018-06-27 14:34:12,849 [containerBootstrapBuilderExecutor-18] action:119 INFO c.s.c.o.s.c.SaltConnector - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] SaltBoot. SaltAction response: SaltBootResponses{responses=[SaltBootResponse{status='', address='10.251.3.69', statusCode=500, version='null', errorText='it is expected to have a default domain, but it is empty'}, SaltBootResponse{status='', address='10.251.3.68', statusCode=500, version='null', errorText='it is expected to have a default domain, but it is empty'}, SaltBootResponse{status='', address='10.251.3.69', statusCode=500, version='null', errorText='it is expected to have a default domain, but it is empty'}]} cloudbreak_1 | 2018-06-27 14:34:12,851 [containerBootstrapBuilderExecutor-18] call:64 INFO c.s.c.o.s.p.SaltBootstrap - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] SaltBootstrap responses: SaltBootResponses{responses=[SaltBootResponse{status='', address='10.251.3.69', statusCode=500, version='null', errorText='it is expected to have a default domain, but it is empty'}, SaltBootResponse{status='', address='10.251.3.68', statusCode=500, version='null', errorText='it is expected to have a default domain, but it is empty'}, SaltBootResponse{status='', address='10.251.3.69', statusCode=500, version='null', errorText='it is expected to have a default domain, but it is empty'}]} cloudbreak_1 | 2018-06-27 14:34:12,852 [containerBootstrapBuilderExecutor-18] call:67 INFO c.s.c.o.s.p.SaltBootstrap - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] Failed to distributed salt run to: 10.251.3.69 cloudbreak_1 | 2018-06-27 14:34:12,853 [containerBootstrapBuilderExecutor-18] call:67 INFO c.s.c.o.s.p.SaltBootstrap - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] Failed to distributed salt run to: 10.251.3.68 cloudbreak_1 | 2018-06-27 14:34:12,853 [containerBootstrapBuilderExecutor-18] call:67 INFO c.s.c.o.s.p.SaltBootstrap - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] Failed to distributed salt run to: 10.251.3.69 cloudbreak_1 | 2018-06-27 14:34:12,854 [containerBootstrapBuilderExecutor-18] call:75 INFO c.s.c.o.s.p.SaltBootstrap - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] Missing nodes to run saltbootstrap: [Node{privateIp='10.251.3.69', publicIp='10.251.3.69', hostname='', domain='null', hostGroup='master', dataVolumes=null}, Node{privateIp='10.251.3.68', publicIp='10.251.3.68', hostname='', domain='null', hostGroup='worker', dataVolumes=null}] cloudbreak_1 | 2018-06-27 14:34:12,855 [containerBootstrapBuilderExecutor-18] doCall:111 WARN c.s.c.o.OrchestratorBootstrapRunner - [owner:15adc8a5-f35f-4e42-b1da-2567ccad0c59] [type:STACK] [id:13] [name:dsbeta01] [flow:9c9705c6-d536-4fb7-8bf1-f1075b5d8ff0] [tracking:] Orchestrator component Salt failed to start, retrying [60/90], error count [60/90]. Elapsed time: 52 ms, Total elapsed time: 593529 ms, Reason: com.sequenceiq.cloudbreak.orchestrator.exception.CloudbreakOrchestratorFailedException: There are missing nodes from saltbootstrap: [Node{privateIp='10.251.3.69', publicIp='10.251.3.69', hostname='', domain='null', hostGroup='master', dataVolumes=null}, Node{privateIp='10.251.3.68', publicIp='10.251.3.68', hostname='', domain='null', hostGroup='worker', dataVolumes=null}], additional info: SaltBootstrap{sc=com.sequenceiq.cloudbreak.orchestrator.salt.client.SaltConnector@3172c1ff, allGatewayConfigs=[GatewayConfig{connectionAddress='10.251.3.69', publicAddress='10.251.3.69', privateAddress='10.251.3.69', hostname='null', gatewayPort=9443, knoxGatewayEnabled=true, primary=true}], originalTargets=[Node{privateIp='10.251.3.69', publicIp='10.251.3.69', hostname='', domain='null', hostGroup='master', dataVolumes=null}, Node{privateIp='10.251.3.68', publicIp='10.251.3.68', hostname='', domain='null', hostGroup='worker', dataVolumes=null}], targets=[Node{privateIp='10.251.3.69', publicIp='10.251.3.69', hostname='', domain='null', hostGroup='master', dataVolumes=null}, Node{privateIp='10.251.3.68', publicIp='10.251.3.68', hostname='', domain='null', hostGroup='worker', dataVolumes=null}]} cloudbreak_1 | 2018-06-27 14:34:12,867 [http-nio-8080-exec-2] getAllForAutoscale:170 INFO c.s.c.s.StackCommonService - [owner:undefined] [type:StackV1] [id:] [name:] [flow:] [tracking:] Get all stack, autoscale authorized only.
Created 06-27-2018 03:11 PM
Hi @Jakub Igla,
it looks like your virtual machines haven't got an hostname/domain name. This can be a DHCP or (reverse) DNS issue. Could you post the output of the following commands:
hostname -d hostname -f
Also please attach the following:
Created 06-27-2018 03:05 PM
I confirm, that this is only an issue, when my VNET is using custom DNS (like those provided from AADDS).
CloudBreak 2.6 was using unbound service, and hosts could communicate with each other using "example.com".
Seems like it's not the case anymore, or there's a missing configuration.
This is a massive blocker for us.
Created 06-27-2018 03:11 PM
Hi @Jakub Igla,
it looks like your virtual machines haven't got an hostname/domain name. This can be a DHCP or (reverse) DNS issue. Could you post the output of the following commands:
hostname -d hostname -f
Also please attach the following:
Created 06-27-2018 03:26 PM
Created 06-27-2018 04:02 PM
Hi @Jakub Igla,
we had to deactivate setting example.com as the fallback domain for Azure, as there is an issue on Azure when sometimes we have to wait unknown time to get the domain name.
In your case it looks like, that for private network with custom DNS this would never happen.
I think you should try to set CB_HOST_DISCOVERY_CUSTOM_DOMAIN in your Profile under your deployment directory and restart cloudbreak with 'cbd restart':
export CB_HOST_DISCOVERY_CUSTOM_DOMAIN=test.com
This will setup all of your cluster with this domain. Hopefully it will override the waiting and you will have a functional cluster. Please let me know if this helped.
Created 06-27-2018 10:05 PM
Hi @mmolnar
I can confirm that after adding this env variable I don't have this issue anymore and I'm on cbd 2.7.1-rc.13
Thank you!