Reply
New Contributor
Posts: 2
Registered: ‎04-04-2016

Cloudera Director has issue with AWS non default VPC DHCP option

[ Edited ]

We tried to use director 2.2 but failed to deploy cloudera manager. It complained DNS entry was invalid. 

 

We changed VPC DHCP option setting to default the problem went away. 

 

In our DHCP customized setting we need to specify 2 IP addresses as DNS servers. Which might have caused error. 

 

 

New Contributor
Posts: 2
Registered: ‎04-04-2016

Re: Cloudera Director has issue with AWS non default VPC DHCP option

Error message:

 

DNS is not configured correctly on at least one instance. Both forward and reverse DNS resolution must work on all instances in order for Cloudera Manager and cluster services to work properly. Please check the DNS configuration of your environment and instance images. Misconfigured instance: PluggableComputeInstance{ipAddress=Optional.of(172.31.24.16), delegate=null, hostEndpoints=[HostEndpoint{hostAddressString='172.31.24.16', hostAddress=Optional.of(/172.31.24.16)}, HostEndpoint{hostAddressString='ip-172-31-24-16.ec2.internal', hostAddress=Optional.absent()}, HostEndpoint{hostAddressString='54.157.215.35', hostAddress=Optional.of(/54.157.215.35)}, HostEndpoint{hostAddressString='ec2-54-157-215-35.compute-1.amazonaws.com', hostAddress=Optional.absent()}]} Instance{virtualInstance=VirtualInstance{id='6ba860cf-7a15-48a2-b1d5-b3acb70c899f', template=InstanceTemplate{name='CDH59_Template', type='m4.large', image='ami-b18c62da', bootstrapScriptIsPresent=false, config={ebsVolumeCount=0, subnetId=subnet-19cc1e6e, enableEbsEncryption=false, rootVolumeType=gp2, instanceNamePrefix=director, ebsVolumeSizeGiB=500, rootVolumeSizeGB=50, useSpotInstances=false, ebsVolumeType=st1, securityGroupsIds=sg-00230f65}, tags={name=cdh59}, normalizeInstance=true, sshUsername=Optional.absent()}}, capabilities=Optional.of(Capabilities{operatingSystemType=REDHAT_COMPATIBLE, operatingSystemVersion=REDHAT_COMPATIBLE_6, virtualizationType=HARDWARE_ASSISTED, packageManager=Optional.of(YUM), javaVendor=Optional.absent(), javaVersion=Optional.absent(), pythonVersion=Optional.of(2.6.6), passwordlessSudoEnabled=true, selinuxEnabled=true, iptablesEnabled=false, dnsConfigured=false, fqdn=Optional.absent(), clouderaManagerAgentInstalled=false, customScriptPaths={}})}

Cloudera Employee
Posts: 49
Registered: ‎02-18-2014

Re: Cloudera Director has issue with AWS non default VPC DHCP option

Hi kevinz,

 

The error message you found says it all. We do require that both forward and reverse DNS resolution work on all instances. During the bootstrap process for a cluster, Director checks for this and many other things as part of how it inspects new instances for what they have installed and what they are capable of. It's possible that by altering how DNS works within AWS for your VPC, resolution stopped working correctly.

 

Because there are so many ways that customers can do custom DNS, I don't have any specific recommendation for how to fix the problem, although reverting to AWS defaults certainly works. You can look for warning messages in the Director log from com.cloudera.launchpad.inspector.OperatingSystemCapabilitiesInspector to see what went wrong during the checks. Director attempts to use hostname, nslookup, or (as a fallback) Python to perform reverse DNS lookups for the local machine, and if it cannot get it to pass, it flags it as a problem.

Explorer
Posts: 20
Registered: ‎10-02-2017

Re: Cloudera Director has issue with AWS non default VPC DHCP option

Hi.

 

Curious to hear if you've resolved this.  We are running into a similar issue where we can successfully bootstrap a CDH cluster in AWS when using the default AWS DHCP options set, however when we change the options set to point to our own DNS servers, the bootstrap fails when installing the manager agents.  I can ping all cluster instances from Director using the default AWS private hostname, as well as our custom hostname.  All cluster instances can ping one another using either hostname. After reading this thread, I'm thinking the mismatch between forward and reverse lookup could be the issue.

 

BTW - Our DNS servers (hosted in AWS) forward to AWS DNS any addresses it cannot resolve.  

Cloudera Employee
Posts: 18
Registered: ‎07-25-2017

Re: Cloudera Director has issue with AWS non default VPC DHCP option

As Bill has mentioned, director does some validation on hostname resolution and reverse resolution. If you see warnings in the Director log from com.cloudera.launchpad.inspector.OperatingSystemCapabilitiesInspector, you get some hint what might go wrong.

If it passes that stage, it might still go wrong where it is actually used. Since you mention CM instance installation, you would like to take a look at CM log under /var/log/cloudera-scm-server, and maybe agent log under /var/log/cloudera-scm-agent.

Explorer
Posts: 20
Registered: ‎10-02-2017

Re: Cloudera Director has issue with AWS non default VPC DHCP option

Thanks for the quick reply.   We don't have a problem with DNS resolution.  From the Director instance and from all cluster node instances, we can ping / resolve all cluster hostnames using both the ec2.internal FQDN, and our company specific FQDN - which is set within the boostrap script.  

 

Let me back up a bit to provide more detail...

 

For each VPC that we launch customer clusters within, we need to set the DHCP Option Set to use our company domain name and DNS servers (hosted in an admin VPN on AWS).  Our DNS severs include a forwarder setting which points to Amazon DNS.   Within the boostrap script defined in the Cloudera Director template, we set the instance hostname of all cluster nodes during first boot like so:

 

hostnamectl set-hostname `hostname -s`.our.company.com

 

After Cloudera Director boostraps the instances, we can ping them using both our company specific FQDN  and the default ec2.internal FQDN.  However, the cluster boostrap always fails waiting for agent to startup on Cloudera Manager node.  I'm not seeing the same errors reported in this thread - nor does searching for com.cloudera.launchpad.inspector.OperatingSystemCapabilitiesInspector in the director logs reveal anything obvious.  I can access all nodes using every possible method I can think off that Cloudera could be using.

 

When we change the DHCP Option Set of the VPC back to Amazon defaults, the cluster deploys successfully.   Any other suggestions / places to check?

 

Thank you

Explorer
Posts: 20
Registered: ‎10-02-2017

Re: Cloudera Director has issue with AWS non default VPC DHCP option

I just wanted to note that performing reverse lookup of the cluster IP's (from the Director VPC and cluster VPC) results in our company FQDN.  The Director instance resides in a VPC using the same company DNS servers as our cluster VPC.   

 

Is it possible that the Bootstrap process initally grabs the ec2.internal hostnames of the cluster nodes from AWS, then later tries to perform a reverse lookup of IP's to ensure they match the ec2.internal hostname?  If so, that could be the issue - as I mentioned, a reverse lookup of IP's results in our company specific DNS domain.

Cloudera Employee
Posts: 18
Registered: ‎07-25-2017

Re: Cloudera Director has issue with AWS non default VPC DHCP option

CM agent installation is the place where hostname is used, whereas director just does some primitive validation. Passing the validation does not guarantee that CM will work properly.

 

CM determines hostname from InetAddress.getLocalHost(), there might be other use cases downstream. If you are having problem with agent installation, definitely check CM server log, /var/log/cloudera-scm-server, where it compiles the steps that needs to be run, and monitoring the progress; and CM agent log, /var/log/cloudera-scm-agent, where the step is actually executed. Don't look for warnings in director log at this point, because it is just waiting for CM server activity to complete.

 

Also please feel free to post your CM log traces. That would be helpful to trouble shoot what went wrong. You can also post the question to Cloudera Manager board, where somebody might have better knowledge how CM uses hostname resolution.

Explorer
Posts: 20
Registered: ‎10-02-2017

Re: Cloudera Director has issue with AWS non default VPC DHCP option

[ Edited ]

Thanks for the reply.  After some research, I've discovered our issue. 

 

As I mentioned previously, we host our own DNS servers in AWS.  We configure the DHCP options set of the Cloudera cluster VPC to point to our DNS servers.  According to the AWS DNS documentation, you need to disable DNS hostname and DNS resolution settings of the VPC to force usage of custom DHCP option set.  Once I made this change, cluster bootstrapping completes.

 

http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html

Announcements