Support Questions
Find answers, ask questions, and share your expertise

How do you use webhdfs in Java through Knox?

Solved Go to solution
Highlighted

How do you use webhdfs in Java through Knox?

Rising Star

I have some groovy code which works:

session = Hadoop.login( "https://etc-lab1-edge01-10:8888/gateway/ehihadoop02", "guest", "guest-password" )
text = Hdfs.ls( session ).dir( "/tmp/guest" ).now().string
import groovy.json.JsonSlurper
json = (new JsonSlurper()).parseText( text )
println json.FileStatuses.FileStatus.pathSuffix

How do I do the same thing in Java? I've tried the following:

String webHdfsUrl = "webhdfs://etc-lab1-edge01-10:8888/";
String dir = "/tmp/guest";
Configuration hdfsConfig = new Configuration();
FileSystem fs = FileSystem.get(URI.create(webHdfsUrl), hdfsConfig);
RemoteIterator<LocatedFileStatus> files = fs.listFiles(new Path(dir), false);
while (files.hasNext()) {
	LocatedFileStatus srcFile = files.next();
	String path = Path.getPathWithoutSchemeAndAuthority(srcFile.getPath()).toString();
	System.out.println(path);
}

But I get java.net.SocketException: Unexpected end of file from server.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How do you use webhdfs in Java through Knox?

Rising Star

First problem with my code was that I should be trying to use SECURE webhdfs (still not sure why I was getting a SocketException). The protocol for secure webhdfs should be swebhdfs.

Second problem was an SSLHandshakeException. I used the VM arg -Djavax.net.debug=all and realized Java didn't trust the SSL connection and we were using a self-signed certificate. So I exported the cert from Chrome and imported it into a new truststore. Adding a VM arg to point to the new truststore seemed to fix that -Djavax.net.ssl.trustStore=C:\Apps\truststore.

-----------

After some more research, looks like using the Java API through Knox is not supported in 0.6.0. More info here.

View solution in original post

13 REPLIES 13
Highlighted

Re: How do you use webhdfs in Java through Knox?

Mentor
Highlighted

Re: How do you use webhdfs in Java through Knox?

Rising Star

Yep that is what I'm trying to use (see second part of my question) but I can't get it working. I've made some progress which I'll post in a new answer.

Highlighted

Re: How do you use webhdfs in Java through Knox?

Rising Star

Looks like it is not supported in the existing FileSystem API. I updated my answer.

Re: How do you use webhdfs in Java through Knox?

Explorer

@Kit Menke could you please explain why we cann't use FileSystem API? I went through the link you provided above (More info here) but didn't quite understand. I am specifically looking to use Java Api for Knox instead of HTTP client.

Highlighted

Re: How do you use webhdfs in Java through Knox?

@Kit Menke check this for an interactive Java Knox DSL shell, where you can test your approach and later compile the parts you need.

Highlighted

Re: How do you use webhdfs in Java through Knox?

Rising Star

I've seen that (see the first part of my question). I'm trying to create a Java app.

Highlighted

Re: How do you use webhdfs in Java through Knox?

Rising Star

First problem with my code was that I should be trying to use SECURE webhdfs (still not sure why I was getting a SocketException). The protocol for secure webhdfs should be swebhdfs.

Second problem was an SSLHandshakeException. I used the VM arg -Djavax.net.debug=all and realized Java didn't trust the SSL connection and we were using a self-signed certificate. So I exported the cert from Chrome and imported it into a new truststore. Adding a VM arg to point to the new truststore seemed to fix that -Djavax.net.ssl.trustStore=C:\Apps\truststore.

-----------

After some more research, looks like using the Java API through Knox is not supported in 0.6.0. More info here.

View solution in original post

Highlighted

Re: How do you use webhdfs in Java through Knox?

Mentor

Is your webhdfs Port and namenodes URL correct?

Highlighted

Re: How do you use webhdfs in Java through Knox?

Rising Star

Hmm yea I think you're right. I debugged WebHdfsFileSystem and it looks like is trying the wrong url /webhdfs/v1/ vs /gateway/ehihadoop02/webhdfs/v1. I think Knox is confusing it.