Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

WebHDFS over Knox and Kerberos from Java

Solved Go to solution
Highlighted

WebHDFS over Knox and Kerberos from Java

Contributor

Hi,

we are trying to use WebHDFS over Knox to access HDFS on our secured cluster from java. We are able to list files/folders there, but we are still struggling with the file creation.

The problem is probably in Oracle's Java library, where the streaming does not seem to be supported when authentication is required:.

In sun.net.www.protocol.http.HttpURLConnection.getInputStream0()

there is something like

if (j == 401) {

/* 1635 */ if (streaming()) {

/* 1636 */ disconnectInternal();

/* 1637 */thrownew HttpRetryException("cannot retry due to server authentication, in streaming mode", 401);

/* */ }

The streaming is not needed for some operation, like list/delete (and therefore it works), but it is required for file creation.

Any suggestions how to handle this?

Thanks a lot,

Pavel

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: WebHDFS over Knox and Kerberos from Java

Contributor

Hi Pavel -

From what I can tell from your description, it seems that you are developing a java application to consume WebHDFS APIs through Knox. We have samples that do this available in {GATEWAY_HOME}/samples directory. The Groovy scripts are based on java classes that leverage HttpClient to handle the basic challenge.

Look at samples/ExampleWebHdfsPutGet.groovy for an example that does a PUT and then a subsequent GET of a given file.

The following shows you how HttpClient is used:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

The following shows you the implementation of the HDFS Put command in our client shell classes which leverages the execute method of the AbstractRequest base class to interact with Knox through the above Hadoop class as the session:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

Here is the base class that uses the Hadoop session class:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

Bottom line: I would suggest that you use HttpClient to do the interactions.

HTH.

--larry

View solution in original post

2 REPLIES 2
Highlighted

Re: WebHDFS over Knox and Kerberos from Java

Contributor

Hi Pavel -

From what I can tell from your description, it seems that you are developing a java application to consume WebHDFS APIs through Knox. We have samples that do this available in {GATEWAY_HOME}/samples directory. The Groovy scripts are based on java classes that leverage HttpClient to handle the basic challenge.

Look at samples/ExampleWebHdfsPutGet.groovy for an example that does a PUT and then a subsequent GET of a given file.

The following shows you how HttpClient is used:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

The following shows you the implementation of the HDFS Put command in our client shell classes which leverages the execute method of the AbstractRequest base class to interact with Knox through the above Hadoop class as the session:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

Here is the base class that uses the Hadoop session class:

https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/she...

Bottom line: I would suggest that you use HttpClient to do the interactions.

HTH.

--larry

View solution in original post

Re: WebHDFS over Knox and Kerberos from Java

Contributor

Hi Larry,

yes, the Apache HttpClient works like a charm.

Thanks,

Pavel

Don't have an account?
Coming from Hortonworks? Activate your account here