After upgrading from CDH 5.7.5 to 5.10.0, Hue's Webserver Status gives error message:
"Bad : The Cloudera Manager Agent is not able to communicate with this role's web server."
Hue is up-and-running, I just think that the CM agent is unable to see it.
I found this in /var/log/cloudera-scm-agent/cloudera-scm-agent.log on the Hue server:
[07/Feb/2017 15:42:09 +0000] 3747 Monitor-GenericMonitor throttling_logger ERROR (59 skipped) Error calling is alive at 'https://0.0.0.0:8888/desktop/debug/is_alive'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.10.0-py2.7.egg/cmf/monitor/generic/hue_adapters.py", line 39, in collect_and_parse
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.10.0-py2.7.egg/cmf/monitor/generic/hue_adapters.py", line 76, in _call_is_alive
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.10.0-py2.7.egg/cmf/url_util.py", line 94, in head_request_with_timeout
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.10.0-py2.7.egg/cmf/url_util.py", line 67, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.7/urllib2.py", line 475, in error
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.10.0-py2.7.egg/cmf/https.py", line 205, in http_error_default
HTTPError: HTTP Error 400: BAD REQUEST
Any help would be most appreciated.
Actually, it is an interesting one. The problem here is that in CDH 5.10, by default, Hue will restrict access to hosts based on the value of "socket.getfqdn()"
Based on the URL that is being used to try to contact Hue, it seems you have some hostname/dns issues here:
Error calling is alive at 'https://0.0.0.0:8888/desktop/debug/is_alive'
In any case, apart from maybe taking a look at how your host and domain are configured on the host, it appears that Hue is blocking the connection because the client's domain does not match allowed_hosts.
In short, I would recommend trying to run the following:
# python -c 'import socket;print socket.getfqdn()'
Then, take the result and add it to your current list of allowed_hosts.
To do so:
In Cloudera Manager, navigate to Hue --> Configuration and then search for "Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini"
In the Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini text field, add the following:
(replace "example.com" with the domain of your host)
If preferred and you just want to get this working, you can open up connections from all hosts by adding this instead:
In CDH 5.10, security was tightened in Hue by only allowing clients that are in the same domain as Hue to connect to Hue. It appears your hostname is actually 0.0.0.0 on the host, so the real fix would be to ensure that the host is configued to have a valid hostname and FQDN.
Let us know if you have questions about the steps or they don't work.
Ben-- Thanks for the info on the increased security in 5.10; it's good to know. I tried your suggestions, both"allowed_hosts=<hostname>" and "allowed_hosts=*" in the "Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini" , restarted Hue, and neither helped.
Just FYI, my client is currently the Hue server, so I'm running Firefox on the same machine that the Hue server is on. I want to reiterate: I am not having trouble connecting to Hue with my client; the CM agent is having trouble communicating with Hue's web server.
To answer your followup question, our distribution is not SLES; we're running RHEL 7.2.
Thank you for the information and I'm sorry to hear that it didn't help.
I did understand that you were not having trouble with the browser; the agent acts as a client of Hue, so I was considering it likely that the new allowed_hosts feature was blocking the agent from connecting to Hue.
Just to be sure, you added the following to the [desktop] section in the safety valve:
If you don't add the [desktop] section header before it, then the configuration will be ignored. As long as "allowed_hosts=*" is in the [desktop] section and you restarted, I'd assess that to be a clean test.
I'll let you know what we might try next.
Yes, most definitely added the header. This is what my "Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini" looks like:
This gets more interesting.
The URL used in the hue health check is generated based on configurations returned by the Cloudera Manager heartbeat response. I tried overriding the "http_host" config in the Hue safety valve to set it to 0.0.0.0 but that did not alter my URL at all.
I checked and indeed the heartbeat response does not include the value for http_host. I think this is a Cloudera Manager bug/issue, but I'll need to take a closer look.
That said, I decided to hack the hue_adapters.py code to just use the host string "0.0.0.0" and reproduced the issue.
The question I have then, is how is Cloudera Manager reporting "0.0.0.0". That indicates to me that there is some problem with the hostname of the host as it appears in your Cloudera Manager Hosts tab. Cloudera Manager will attempt to use the hostname that is reported by the agent to set Hue's http_host value to the hostname on which Hue is running. If it is not able to do so, it is possible it leaves the Hue default which is "0.0.0.0"
I'll keep looking, but please check what your Hue host is named in Cloudera Manager's Hosts tab.
Here is the fqdn, as given by the python command:
$ python -c 'import socket;print socket.getfqdn()'
Which matches the hostname in the hosts tab in CM:
The CM Host Inspector is happy with all hosts, including this one.
Would it be worth moving Hue to another machine in the cluster, to see if that changes anything? Btw this is a test cluster that I am using to try out 5.10.0 before rolling it into production, so I have the luxury of trying just about anything without affecting any users.