Created 05-06-2016 06:25 PM
Hi,
we are trying to use WebHDFS over Knox to access HDFS on our secured cluster from java. We are able to list files/folders there, but we are still struggling with the file creation.
The problem is probably in Oracle's Java library, where the streaming does not seem to be supported when authentication is required:.
In sun.net.www.protocol.http.HttpURLConnection.getInputStream0()
there is something like
if (j == 401) {
/* 1635 */ if (streaming()) {
/* 1636 */ disconnectInternal();
/* 1637 */thrownew HttpRetryException("cannot retry due to server authentication, in streaming mode", 401);
/* */ }
The streaming is not needed for some operation, like list/delete (and therefore it works), but it is required for file creation.
Any suggestions how to handle this?
Thanks a lot,
Pavel
Created 05-06-2016 08:06 PM
Hi Pavel -
From what I can tell from your description, it seems that you are developing a java application to consume WebHDFS APIs through Knox. We have samples that do this available in {GATEWAY_HOME}/samples directory. The Groovy scripts are based on java classes that leverage HttpClient to handle the basic challenge.
Look at samples/ExampleWebHdfsPutGet.groovy for an example that does a PUT and then a subsequent GET of a given file.
The following shows you how HttpClient is used:
The following shows you the implementation of the HDFS Put command in our client shell classes which leverages the execute method of the AbstractRequest base class to interact with Knox through the above Hadoop class as the session:
Here is the base class that uses the Hadoop session class:
Bottom line: I would suggest that you use HttpClient to do the interactions.
HTH.
--larry
Created 05-06-2016 08:06 PM
Hi Pavel -
From what I can tell from your description, it seems that you are developing a java application to consume WebHDFS APIs through Knox. We have samples that do this available in {GATEWAY_HOME}/samples directory. The Groovy scripts are based on java classes that leverage HttpClient to handle the basic challenge.
Look at samples/ExampleWebHdfsPutGet.groovy for an example that does a PUT and then a subsequent GET of a given file.
The following shows you how HttpClient is used:
The following shows you the implementation of the HDFS Put command in our client shell classes which leverages the execute method of the AbstractRequest base class to interact with Knox through the above Hadoop class as the session:
Here is the base class that uses the Hadoop session class:
Bottom line: I would suggest that you use HttpClient to do the interactions.
HTH.
--larry
Created 05-09-2016 07:06 AM
Hi Larry,
yes, the Apache HttpClient works like a charm.
Thanks,
Pavel