I have a small test cluster built in AWS running Cloudera manager 5.8.1-1 and CDH 5.8.0 w/parcels. Cloudera Manager is running on an instance in a private VPC. There is a 2 instance cluster in a public VPC which Cloudera Manager is managing using the internal FQDN for most things:
However, the Hue Web UI link in the CM Hue page gives the external IP address:
I am not even sure where or how it is getting this value. It is internet routable FQDN for the instance, but there is no reference to in /etc/hosts or any other place I can find.
Even stranger, the hue file browser does not work for the same reason. When you click on the file browser link in the hue webui (after manually entering the correct internal IP address for the URL) it hangs and eventually this gets logged to /var/log/hue/error_log:
WebHdfsException: HTTPConnectionPool(host='ec2-54-##-##-###.us-west-1.compute.amazonaws.com', port=50070): Max retries exceeded with url: /webhdfs/v1/user/hive/warehouse?op=GETFILESTATUS&user.name=hue&doas=hdfs (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f5a18cf3310>: Failed to establish a new connection: [Errno 110] Connection timed out',)
I cannot figure out how to make Hue use the internal (10.x.x.x) address for the file browser. We do not want to open up port 50070 to the internet so we need to use the internal interface. In CM Hue configuration there is a radio button which allows you to toggle on/off the webhdfs URL:
HDFS Web Interface Role
In the value field for webhdfs_url it does not show the full name, but the portion it does show looks like it should be using the correct internal IP address (not the external one):
Reset to empty default value
However, when I look at /var/run/cloudera-scm-agent/process/###-hue-HUE_SERVER I see that it is set to the external address:
This is strange because all the other URLs in hue.ini use the correct internal address:
In CM Hue configuration I tried disabling the webhdfs URL radio button ("reset to default empty value") but it gives a validation error when I try to save the change. I also have set a safety valve for webhdfs URL in "Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini" but it does not override the webhfs_url which CM is setting in hue.ini:
I am wondering where CM is getting this external FQDN ec2-54-##-##-##.us-west-1.compute.amazonaws.com -- I find no reference to it on the server itself. I hope there is some way to manually set this value to the internal one. Any help would be appreciated.
As a workaround I have added the ec2-##.... external hostname to the internal IP address entry in the hosts file so hue will resolve the external hostname to the internal IP address of the namenode.
10.###.##.## ip-10-###-##-##.us-west-1.compute.internal ec2-54-##-##-##.us-west-1.compute.amazonaws.com
I would prefer to be able to use Cloudera manager to control the Hue web_hdfs URL, rather than doing it in the hosts file, but this will work for now. Is there any documentation for Cloudera Manager that defines which parameters can be overridden with a safety valve entry and which cannot be overridden?
We have looked at making the URLs configurable, but, for now, they are not. The hostnames in the URLs align with what the host is reporting to Cloudera Manager as the hostname. They change you made is what I would recommend as the solution, so I think you should be good to run with that way.
We don't have a definitive list of what configurations can be overridden. In the case of CDH, pretty much anything can be set in a safety valve.
I added your request for configurable URLs to the existing internal Jira.
It used to be that a cluster running in a public VPC would use the internal hostnames for the WebUI links. Now it seems that has been fixed and CM is being "helpful" and linking to the external hostnames.
Except that I have long since setup a VPN connection into AWS so that I can connect directly to the internal names/IPs. Now I would like for CM to keep linking to the internal hostnames (as reported in the Hosts tab).
+1 for making WebUI URL links be configurable on AWS.