Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How do you use webhdfs in Java through Knox?

avatar
Expert Contributor

I have some groovy code which works:

session = Hadoop.login( "https://etc-lab1-edge01-10:8888/gateway/ehihadoop02", "guest", "guest-password" )
text = Hdfs.ls( session ).dir( "/tmp/guest" ).now().string
import groovy.json.JsonSlurper
json = (new JsonSlurper()).parseText( text )
println json.FileStatuses.FileStatus.pathSuffix

How do I do the same thing in Java? I've tried the following:

String webHdfsUrl = "webhdfs://etc-lab1-edge01-10:8888/";
String dir = "/tmp/guest";
Configuration hdfsConfig = new Configuration();
FileSystem fs = FileSystem.get(URI.create(webHdfsUrl), hdfsConfig);
RemoteIterator<LocatedFileStatus> files = fs.listFiles(new Path(dir), false);
while (files.hasNext()) {
	LocatedFileStatus srcFile = files.next();
	String path = Path.getPathWithoutSchemeAndAuthority(srcFile.getPath()).toString();
	System.out.println(path);
}

But I get java.net.SocketException: Unexpected end of file from server.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

First problem with my code was that I should be trying to use SECURE webhdfs (still not sure why I was getting a SocketException). The protocol for secure webhdfs should be swebhdfs.

Second problem was an SSLHandshakeException. I used the VM arg -Djavax.net.debug=all and realized Java didn't trust the SSL connection and we were using a self-signed certificate. So I exported the cert from Chrome and imported it into a new truststore. Adding a VM arg to point to the new truststore seemed to fix that -Djavax.net.ssl.trustStore=C:\Apps\truststore.

-----------

After some more research, looks like using the Java API through Knox is not supported in 0.6.0. More info here.

View solution in original post

13 REPLIES 13

avatar
Master Mentor

avatar
Expert Contributor

Yep that is what I'm trying to use (see second part of my question) but I can't get it working. I've made some progress which I'll post in a new answer.

avatar
Expert Contributor

Looks like it is not supported in the existing FileSystem API. I updated my answer.

avatar
Contributor

@Kit Menke could you please explain why we cann't use FileSystem API? I went through the link you provided above (More info here) but didn't quite understand. I am specifically looking to use Java Api for Knox instead of HTTP client.

avatar
Master Guru

@Kit Menke check this for an interactive Java Knox DSL shell, where you can test your approach and later compile the parts you need.

avatar
Expert Contributor

I've seen that (see the first part of my question). I'm trying to create a Java app.

avatar
Expert Contributor

First problem with my code was that I should be trying to use SECURE webhdfs (still not sure why I was getting a SocketException). The protocol for secure webhdfs should be swebhdfs.

Second problem was an SSLHandshakeException. I used the VM arg -Djavax.net.debug=all and realized Java didn't trust the SSL connection and we were using a self-signed certificate. So I exported the cert from Chrome and imported it into a new truststore. Adding a VM arg to point to the new truststore seemed to fix that -Djavax.net.ssl.trustStore=C:\Apps\truststore.

-----------

After some more research, looks like using the Java API through Knox is not supported in 0.6.0. More info here.

avatar
Master Mentor

Is your webhdfs Port and namenodes URL correct?

avatar
Expert Contributor

Hmm yea I think you're right. I debugged WebHdfsFileSystem and it looks like is trying the wrong url /webhdfs/v1/ vs /gateway/ehihadoop02/webhdfs/v1. I think Knox is confusing it.