Support Questions

Find answers, ask questions, and share your expertise

HUE server not able to send any kind of HTTP(GET, etc) request to WEBHDFS

avatar

Dear HUE community,

 

We at NOKIA technologies, have been using the cloudera stack for our analytics based products since a few months. We have had a very good experience using it, specially the HUE enabled OOZIE workflow designer. It has made our lives much easier than before.

 

We have now decided to use HUE for the same purpose on our old clusters that do not run cloudera.

We followed instructions and installed HUE(and other required stuff) on our old clusters. But when start the UI in a browser, we see that HUE server just cannot communicate with webhdfs.

 

That webhdfs works fine is known from the fact the same GET command copied from HUE logs, works fine when run from the browser window and an entry is logged in the webhdfs log.

That hue cannot communicate with webhdfs is known because no request entry is logged in the webhdfs log when hue file browser is accessed via the HUE UI.

 

I have checked superuser configuration and privileges - and that is all fine. For simplicity, we have just one user for all services, which is also the hadoop superuser.

 

I paste the relevant logs below:

 

.

.

.

[28/Aug/2016 22:21:33 -0700] middleware DEBUG {"1472473293": {"status": 200, "impersonator": null, "service": "jobbrowser", "url": "/jobbrowser/", "user": "spark1", "ip_address": "135.x.x.x", "authorization_failure": false}}
[28/Aug/2016 22:22:04 -0700] access INFO 135.x.x.x spark1 - "GET /jobbrowser/ HTTP/1.1"
[28/Aug/2016 22:22:04 -0700] connectionpool DEBUG "GET http://XXXX:8088/ws/v1/cluster/apps?limit=10000&user=spark1&finalStatus=UNDEFINED HTTP/1.1" 302 90
[28/Aug/2016 22:22:04 -0700] connectionpool INFO Resetting dropped connection: proxy.XXXXXX.com
[28/Aug/2016 22:22:10 -0700] connectionpool DEBUG "GET http://www.XXXX.com/ws/v1/cluster/apps?limit=10000&user=spark1&finalStatus=UNDEFINED HTTP/1.1" 404 1245

.

.

.

.

 

.

(error 404): Traceback (most recent call last):
File "/opt/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/base.py", line 112, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/opt/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/transaction.py", line 371, in inner
return func(*args, **kwargs)
File "/opt/hue/apps/jobbrowser/src/jobbrowser/views.py", line 121, in jobs
raise ex
RestException: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<title>404 - File or directory not found.</title>
<style type="text/css">

 

1 ACCEPTED SOLUTION

avatar

Well, found the problem...

It seems the HUE server sends the request to webhdfs via the company proxy server - even though the HUE server is setup on the same host as the webhdfs server.

The proxy server obviously didn't know the hostname of the webhdfs server so it couldn't forward the request to it.

Changed the hostname to its public IP address in HUE.ini and everything works.

 

If anyone knows how to allow HUE server to bypass that proxy, please let me know.

View solution in original post

2 REPLIES 2

avatar

Well, found the problem...

It seems the HUE server sends the request to webhdfs via the company proxy server - even though the HUE server is setup on the same host as the webhdfs server.

The proxy server obviously didn't know the hostname of the webhdfs server so it couldn't forward the request to it.

Changed the hostname to its public IP address in HUE.ini and everything works.

 

If anyone knows how to allow HUE server to bypass that proxy, please let me know.

avatar
Super Guru
Hue will blindly call

[hadoop]
[[hdfs_clusters]]
[[[default]]]
webhdfs_url=http://localhost:50070/webhdfs/v1

So, how about you put localhost there?