Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

WebHDFS over Knox and Kerberos from Java

avatar
Rising Star

Hi,

we are trying to use WebHDFS over Knox to access HDFS on our secured cluster from java. We are able to list files/folders there, but we are still struggling with the file creation.

The problem is probably in Oracle's Java library, where the streaming does not seem to be supported when authentication is required:.

In sun.net.www.protocol.http.HttpURLConnection.getInputStream0()

there is something like

if (j == 401) {

/* 1635 */ if (streaming()) {

/* 1636 */ disconnectInternal();

/* 1637 */thrownew HttpRetryException("cannot retry due to server authentication, in streaming mode", 401);

/* */ }

The streaming is not needed for some operation, like list/delete (and therefore it works), but it is required for file creation.

Any suggestions how to handle this?

Thanks a lot,

Pavel

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi Pavel -

From what I can tell from your description, it seems that you are developing a java application to consume WebHDFS APIs through Knox. We have samples that do this available in {GATEWAY_HOME}/samples directory. The Groovy scripts are based on java classes that leverage HttpClient to handle the basic challenge.

Look at samples/ExampleWebHdfsPutGet.groovy for an example that does a PUT and then a subsequent GET of a given file.

The following shows you how HttpClient is used:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

The following shows you the implementation of the HDFS Put command in our client shell classes which leverages the execute method of the AbstractRequest base class to interact with Knox through the above Hadoop class as the session:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

Here is the base class that uses the Hadoop session class:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

Bottom line: I would suggest that you use HttpClient to do the interactions.

HTH.

--larry

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Hi Pavel -

From what I can tell from your description, it seems that you are developing a java application to consume WebHDFS APIs through Knox. We have samples that do this available in {GATEWAY_HOME}/samples directory. The Groovy scripts are based on java classes that leverage HttpClient to handle the basic challenge.

Look at samples/ExampleWebHdfsPutGet.groovy for an example that does a PUT and then a subsequent GET of a given file.

The following shows you how HttpClient is used:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

The following shows you the implementation of the HDFS Put command in our client shell classes which leverages the execute method of the AbstractRequest base class to interact with Knox through the above Hadoop class as the session:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

Here is the base class that uses the Hadoop session class:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

Bottom line: I would suggest that you use HttpClient to do the interactions.

HTH.

--larry

avatar
Rising Star

Hi Larry,

yes, the Apache HttpClient works like a charm.

Thanks,

Pavel