- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How do you use webhdfs in Java through Knox?
- Labels:
-
Apache Hadoop
-
Apache Knox
Created 02-25-2016 10:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have some groovy code which works:
session = Hadoop.login( "https://etc-lab1-edge01-10:8888/gateway/ehihadoop02", "guest", "guest-password" ) text = Hdfs.ls( session ).dir( "/tmp/guest" ).now().string import groovy.json.JsonSlurper json = (new JsonSlurper()).parseText( text ) println json.FileStatuses.FileStatus.pathSuffix
How do I do the same thing in Java? I've tried the following:
String webHdfsUrl = "webhdfs://etc-lab1-edge01-10:8888/"; String dir = "/tmp/guest"; Configuration hdfsConfig = new Configuration(); FileSystem fs = FileSystem.get(URI.create(webHdfsUrl), hdfsConfig); RemoteIterator<LocatedFileStatus> files = fs.listFiles(new Path(dir), false); while (files.hasNext()) { LocatedFileStatus srcFile = files.next(); String path = Path.getPathWithoutSchemeAndAuthority(srcFile.getPath()).toString(); System.out.println(path); }
But I get java.net.SocketException: Unexpected end of file from server.
Created 02-26-2016 03:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First problem with my code was that I should be trying to use SECURE webhdfs (still not sure why I was getting a SocketException). The protocol for secure webhdfs should be swebhdfs.
Second problem was an SSLHandshakeException. I used the VM arg -Djavax.net.debug=all and realized Java didn't trust the SSL connection and we were using a self-signed certificate. So I exported the cert from Chrome and imported it into a new truststore. Adding a VM arg to point to the new truststore seemed to fix that -Djavax.net.ssl.trustStore=C:\Apps\truststore.
-----------
After some more research, looks like using the Java API through Knox is not supported in 0.6.0. More info here.
Created 02-25-2016 10:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Found this https://github.com/wdavidw/webhdfs-java-client
Personally why not just use java hdfs api?
http://tutorials.techmytalk.com/2014/08/16/hadoop-hdfs-java-api/
Created 02-26-2016 02:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep that is what I'm trying to use (see second part of my question) but I can't get it working. I've made some progress which I'll post in a new answer.
Created 02-26-2016 07:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looks like it is not supported in the existing FileSystem API. I updated my answer.
Created 11-29-2017 09:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Kit Menke could you please explain why we cann't use FileSystem API? I went through the link you provided above (More info here) but didn't quite understand. I am specifically looking to use Java Api for Knox instead of HTTP client.
Created 02-25-2016 11:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Kit Menke check this for an interactive Java Knox DSL shell, where you can test your approach and later compile the parts you need.
Created 02-26-2016 02:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've seen that (see the first part of my question). I'm trying to create a Java app.
Created 02-26-2016 03:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First problem with my code was that I should be trying to use SECURE webhdfs (still not sure why I was getting a SocketException). The protocol for secure webhdfs should be swebhdfs.
Second problem was an SSLHandshakeException. I used the VM arg -Djavax.net.debug=all and realized Java didn't trust the SSL connection and we were using a self-signed certificate. So I exported the cert from Chrome and imported it into a new truststore. Adding a VM arg to point to the new truststore seemed to fix that -Djavax.net.ssl.trustStore=C:\Apps\truststore.
-----------
After some more research, looks like using the Java API through Knox is not supported in 0.6.0. More info here.
Created 02-26-2016 03:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is your webhdfs Port and namenodes URL correct?
Created 02-26-2016 04:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hmm yea I think you're right. I debugged WebHdfsFileSystem and it looks like is trying the wrong url /webhdfs/v1/ vs /gateway/ehihadoop02/webhdfs/v1. I think Knox is confusing it.