I have a CDH 5.3.8 cluster with HBase, HDFS, MapReduce, Solr, and Zookeeper services running on multiple Ubuntu 12.04 servers. Each server has the same /etc/hosts file which includes all of the hosts in the cluster, configured in the following format:
127.0.0.1 localhost.localdomain localhost 192.168.1.1 cluster-01.example.com cluster-01 192.168.1.2 cluster-02.example.com cluster-02 192.168.1.3 cluster-03.example.com cluster-03
The Ubuntu servers are also configured with 2 name servers to provide DNS for anything outside the cluster (FYI, /etc/nsswitch.conf hosts entry is set to check files then dns).
My understanding was that with the above setup, all network name resolutions for the CDH cluster was done through the /etc/hosts files, however, I have learned that the cluster still has some kind of dependency on the DNS name servers but I am not sure what these are.
I had a situation where one of the name servers became unavailable and as a result I noticed there was a performance impact on multiple CDH services. For example, running 'status' in Hbase shell took 14s to complete (as opposed to 3s) and Solr Web Collection metrics increased by ~20s. When I removed the unavailable name server from resolv.conf, the performance of the cluster was back to normal.
From the OS perspective, when the name server was down, DNS resolution worked without issue using the available name server (as verified with nslookup).
Did I miss something in the Cloudera configuration?
PS: CDH was installed using Cloudera manager and most of the configuration settings are the default/recommended settings determined by Cloudera.