How do you use webhdfs in Java through Knox?

I have some groovy code which works:

session = Hadoop.login( "https://etc-lab1-edge01-10:8888/gateway/ehihadoop02", "guest", "guest-password" )
text = session ).dir( "/tmp/guest" ).now().string
import groovy.json.JsonSlurper
json = (new JsonSlurper()).parseText( text )
println json.FileStatuses.FileStatus.pathSuffix

How do I do the same thing in Java? I've tried the following:

String webHdfsUrl = "webhdfs://etc-lab1-edge01-10:8888/";
String dir = "/tmp/guest";
Configuration hdfsConfig = new Configuration();
FileSystem fs = FileSystem.get(URI.create(webHdfsUrl), hdfsConfig);
RemoteIterator<LocatedFileStatus> files = fs.listFiles(new Path(dir), false);
while (files.hasNext()) {
	LocatedFileStatus srcFile =;
	String path = Path.getPathWithoutSchemeAndAuthority(srcFile.getPath()).toString();

But I get Unexpected end of file from server.


First problem with my code was that I should be trying to use SECURE webhdfs (still not sure why I was getting a SocketException). The protocol for secure webhdfs should be swebhdfs.

Second problem was an SSLHandshakeException. I used the VM arg and realized Java didn't trust the SSL connection and we were using a self-signed certificate. So I exported the cert from Chrome and imported it into a new truststore. Adding a VM arg to point to the new truststore seemed to fix that\Apps\truststore.


After some more research, looks like using the Java API through Knox is not supported in 0.6.0. More info here.

Yep that is what I'm trying to use (see second part of my question) but I can't get it working. I've made some progress which I'll post in a new answer.

Looks like it is not supported in the existing FileSystem API. I updated my answer.


@Kit Menke could you please explain why we cann't use FileSystem API? I went through the link you provided above (More info here) but didn't quite understand. I am specifically looking to use Java Api for Knox instead of HTTP client.

@Kit Menke check this for an interactive Java Knox DSL shell, where you can test your approach and later compile the parts you need.

I've seen that (see the first part of my question). I'm trying to create a Java app.

Is your webhdfs Port and namenodes URL correct?

Hmm yea I think you're right. I debugged WebHdfsFileSystem and it looks like is trying the wrong url /webhdfs/v1/ vs /gateway/ehihadoop02/webhdfs/v1. I think Knox is confusing it.