Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

​Internal DNS is missing after stopping and starting a Cloudbreak Azure Cluster.

Highlighted

​Internal DNS is missing after stopping and starting a Cloudbreak Azure Cluster.

Expert Contributor

Internal DNS is missing after stopping and starting a Cloudbreak Azure Cluster. (I'm unable to ssh from one node to another using internal dns/fqdn. I can use external IPs to access the cluster.) We installed the Cloudbreak Azure cluster, then added components. Stopped the cluster over night then started it in the morning. All nodes are still alive and up but internal dns entries were lost overnight and the servers can no longer talk to each other. The documentation does not mention any requirement to setup internal DNS entries for Cloudbreak. Does Cloudbreak automate DNS entries for internal cluster references? or is there additional configuration steps required?

3 REPLIES 3
Highlighted

Re: ​Internal DNS is missing after stopping and starting a Cloudbreak Azure Cluster.

@pdarvasi Can you help here?

Highlighted

Re: ​Internal DNS is missing after stopping and starting a Cloudbreak Azure Cluster.

Expert Contributor

Additional information from the cloudbreak terminal:

9/18/2017 9:54:50 AM hadoop-pilot-rg - create in progress: Setting up HDP image
9/18/2017 9:54:53 AM hadoop-pilot-rg - create in progress: Creating infrastructure
9/18/2017 9:58:56 AM hadoop-pilot-rg - update in progress: Infrastructure creation took 242 seconds
9/18/2017 9:58:59 AM hadoop-pilot-rg - update in progress: Infrastructure metadata collection finished
9/18/2017 9:59:03 AM hadoop-pilot-rg - available: Infrastructure successfully provisioned
9/18/2017 9:59:03 AM hadoop-pilot-rg - update in progress: Bootstrapping infrastructure cluster
9/18/2017 9:59:21 AM hadoop-pilot-rg - update in progress: Setting up infrastructure metadata
9/18/2017 9:59:22 AM hadoop-pilot-rg - update in progress: Starting Ambari cluster services
9/18/2017 10:03:14 AM hadoop-pilot-rg - update in progress: Building Ambari cluster; Ambari ip:52.235.47.2
9/18/2017 10:12:07 AM hadoop-pilot-rg - available: Ambari cluster built; Ambari ip:52.235.47.2
9/18/2017 12:22:24 PM hadoop-pilot-rg - available: Synced instance states with the cloud provider.
9/18/2017 12:22:24 PM hadoop-pilot-rg - available: Host [name: had-hg79.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net] state has been updated to: UNHEALTHY
9/18/2017 12:22:26 PM hadoop-pilot-rg - available: The cluster state synchronized with Ambari: There are stopped and running Ambari services as well. Restart or stop all of them and try syncing later.
9/18/2017 6:32:35 PM hadoop-pilot-rg - update in progress: Stopping Ambari cluster
9/18/2017 6:32:35 PM hadoop-pilot-rg - stop requested: Cluster infrastructure stop requested
9/18/2017 6:32:36 PM hadoop-pilot-rg - update in progress: Stopping Ambari services.
9/18/2017 6:33:46 PM hadoop-pilot-rg - update in progress: Ambari services have been stopped.
9/18/2017 6:33:46 PM hadoop-pilot-rg - stopped: Ambari cluster stopped
9/18/2017 6:33:47 PM hadoop-pilot-rg - stop in progress: Infrastructure is now stopping
9/18/2017 6:37:00 PM hadoop-pilot-rg - : Manual recovery is needed for the following failed nodes: [had-hg10.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net]
9/18/2017 6:39:00 PM hadoop-pilot-rg - : Manual recovery is needed for the following failed nodes: [had-hg21.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg10.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net]
9/18/2017 6:41:00 PM hadoop-pilot-rg - : Manual recovery is needed for the following failed nodes: [had-hg32.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg21.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg10.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net]
9/18/2017 6:43:00 PM hadoop-pilot-rg - : Manual recovery is needed for the following failed nodes: [had-hg32.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg21.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg43.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg10.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net]
9/18/2017 6:45:00 PM hadoop-pilot-rg - : Manual recovery is needed for the following failed nodes: [had-hg32.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg21.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg54.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg43.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg10.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net]
9/18/2017 6:50:56 PM hadoop-pilot-rg - stopped: Infrastructure successfully stopped
9/19/2017 8:41:35 AM hadoop-pilot-rg - start requested: Ambari cluster start requested
9/19/2017 8:41:35 AM hadoop-pilot-rg - start in progress: Infrastructure is now starting
9/19/2017 8:51:18 AM hadoop-pilot-rg - available: Infrastructure successfully started
9/19/2017 8:51:18 AM hadoop-pilot-rg - update in progress: Starting Ambari cluster
9/19/2017 9:11:46 AM hadoop-pilot-rg - update in progress: Starting Ambari services.
9/19/2017 9:22:41 AM hadoop-pilot-rg - update in progress: Ambari services have been started.
9/19/2017 9:22:41 AM hadoop-pilot-rg - available: Ambari cluster started; Ambari ip:52.235.47.2
9/19/2017 9:24:00 AM hadoop-pilot-rg - : Manual recovery is needed for the following failed nodes: [had-hg32.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg21.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg57.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg54.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg43.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg55.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg10.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg56.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net, had-hg68.es3fbvrcxbue1mq3h4m05wziag.vx.internal.cloudapp.net]

Re: ​Internal DNS is missing after stopping and starting a Cloudbreak Azure Cluster.

Expert Contributor

Hi @Matt Andruff,

I was not able to reproduce the issue. Could you check whether the unbound dns service is running on the host please? You can do it with:

ps -ef | grep -i unbound
systemctl status unbound

Please also check the content of the of the /etc/resolv.conf file and please try to execute an nslookup on the dns name what you would like to resolve and send us the output of the nslookup command.

Thanks,

Attila

Don't have an account?
Coming from Hortonworks? Activate your account here