Created 02-25-2016 10:20 PM
I have some groovy code which works:
session = Hadoop.login( "https://etc-lab1-edge01-10:8888/gateway/ehihadoop02", "guest", "guest-password" ) text = Hdfs.ls( session ).dir( "/tmp/guest" ).now().string import groovy.json.JsonSlurper json = (new JsonSlurper()).parseText( text ) println json.FileStatuses.FileStatus.pathSuffix
How do I do the same thing in Java? I've tried the following:
String webHdfsUrl = "webhdfs://etc-lab1-edge01-10:8888/"; String dir = "/tmp/guest"; Configuration hdfsConfig = new Configuration(); FileSystem fs = FileSystem.get(URI.create(webHdfsUrl), hdfsConfig); RemoteIterator<LocatedFileStatus> files = fs.listFiles(new Path(dir), false); while (files.hasNext()) { LocatedFileStatus srcFile = files.next(); String path = Path.getPathWithoutSchemeAndAuthority(srcFile.getPath()).toString(); System.out.println(path); }
But I get java.net.SocketException: Unexpected end of file from server.
Created 02-26-2016 03:34 PM
First problem with my code was that I should be trying to use SECURE webhdfs (still not sure why I was getting a SocketException). The protocol for secure webhdfs should be swebhdfs.
Second problem was an SSLHandshakeException. I used the VM arg -Djavax.net.debug=all and realized Java didn't trust the SSL connection and we were using a self-signed certificate. So I exported the cert from Chrome and imported it into a new truststore. Adding a VM arg to point to the new truststore seemed to fix that -Djavax.net.ssl.trustStore=C:\Apps\truststore.
-----------
After some more research, looks like using the Java API through Knox is not supported in 0.6.0. More info here.
Created 02-29-2016 07:26 PM
Sorry it took me a while to response here but I was putting together a working sample. The first important point is that I think people tend to overestimate the complexity of dealing with the REST APIs, especially WebHDFS. The point of having REST APIs after all is supposed to be very thin clients. I played with a few different Java HTTP client libraries and to my surprise the venerable Java HttpsUrlConnection resulted in the cleanest examples. The Apache HttpClient is certainly an option and might be warranted in more complex situations.
IMPORTANT: Before you continue however please note that these examples are setup to circumvent both SSL hostname and certificate validation. This is not acceptable in production but often helps in samples to make sure they don't become a barrier to success.
I'll show the heart of the solution below but the full answer can be found here: https://github.com/kminder/knox-webhdfs-client-examples. Specifically here: https://github.com/kminder/knox-webhdfs-client-examples/blob/master/src/test/java/net/minder/KnoxWeb...
Now for the code. The first is an example of the simplest of operations: GETHOMEDIRECTORY.
@Test public void getHomeDirExample() throws Exception { HttpsURLConnection connection; InputStream input; JsonNode json; connection = createHttpUrlConnection( WEBHDFS_URL + "?op=GETHOMEDIRECTORY" ); input = connection.getInputStream(); json = MAPPER.readTree( input ); input.close(); connection.disconnect(); assertThat( json.get( "Path" ).asText(), is( "/user/"+TEST_USERNAME ) ); }
Next a more complicated sample that writes and reads a file to HDFS via the CREATE and OPEN operations.
@Test public void putGetFileExample() throws Exception { HttpsURLConnection connection; String redirect; InputStream input; OutputStream output; String data = UUID.randomUUID().toString(); connection = createHttpUrlConnection( WEBHDFS_URL + "/tmp/" + data + "/?op=CREATE" ); connection.setRequestMethod( "PUT" ); assertThat( connection.getResponseCode(), is(307) ); redirect = connection.getHeaderField( "Location" ); connection.disconnect(); connection = createHttpUrlConnection( redirect ); connection.setRequestMethod( "PUT" ); connection.setDoOutput( true ); output = connection.getOutputStream(); IOUtils.write( data.getBytes(), output ); output.close(); connection.disconnect(); assertThat( connection.getResponseCode(), is(201) ); connection = createHttpUrlConnection( WEBHDFS_URL + "/tmp/" + data + "/?op=OPEN" ); assertThat( connection.getResponseCode(), is(307) ); redirect = connection.getHeaderField( "Location" ); connection.disconnect(); connection = createHttpUrlConnection( redirect ); input = connection.getInputStream(); assertThat( IOUtils.toString( input ), is( data ) ); input.close(); connection.disconnect(); }
Now of course you have probably noticed that all of the "magic" is hidden in that createHttpUrlConnection method. Not really magic at all but this is where the "un-securing" of SSL happens. This also takes care of setting up HTTP BasicAuth for authentication and disables redirects which should be done when using the WebHDFS REST APIs.
private HttpsURLConnection createHttpUrlConnection( URL url ) throws Exception { HttpsURLConnection conn = (HttpsURLConnection)url.openConnection(); conn.setHostnameVerifier( new TrustAllHosts() ); conn.setSSLSocketFactory( TrustAllCerts.createInsecureSslContext().getSocketFactory() ); conn.setInstanceFollowRedirects( false ); String credentials = TEST_USERNAME + ":" + TEST_PASSWORD; conn.setRequestProperty( "Authorization", "Basic " + DatatypeConverter.printBase64Binary(credentials.getBytes() ) ); return conn; } private HttpsURLConnection createHttpUrlConnection( String url ) throws Exception { return createHttpUrlConnection( new URL( url ) ); }
Created 03-01-2016 01:18 AM
Created 03-01-2016 03:58 PM
@Kevin Minder This is awesome! Thank you!!
Created 12-03-2018 01:51 PM
Could you please let me know if you were able to call knox webhdfs via JAVA. I am trying same but I get the following exception at
Hdfs.ls( session ).dir( "/" ).now().getString()
log4j:WARN No appenders could be found for logger (org.apache.http.client.protocol.RequestAddCookies).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception
in thread "main" org.apache.knox.gateway.shell.HadoopException:
javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to
find valid certification path to requested target
at org.apache.knox.gateway.shell.AbstractRequest.now(AbstractRequest.java:85)
at webhdfsclient.program.main(program.java:23)
Caused
by: javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to
find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:961)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:396)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.apache.knox.gateway.shell.Hadoop.executeNow(Hadoop.java:256)
at org.apache.knox.gateway.shell.AbstractRequest.execute(AbstractRequest.java:50)
at org.apache.knox.gateway.shell.hdfs.Ls$Request.access$200(Ls.java:31)
at org.apache.knox.gateway.shell.hdfs.Ls$Request$1.call(Ls.java:51)
at org.apache.knox.gateway.shell.hdfs.Ls$Request$1.call(Ls.java:45)
at org.apache.knox.gateway.shell.AbstractRequest.now(AbstractRequest.java:83)
... 1 more
Caused
by: sun.security.validator.ValidatorException: PKIX path building
failed: sun.security.provider.certpath.SunCertPathBuilderException:
unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302)
at sun.security.validator.Validator.validate(Validator.java:260)
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
... 25 more
Caused
by: sun.security.provider.certpath.SunCertPathBuilderException: unable
to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392)
... 31 more