Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How do you use webhdfs in Java through Knox?

avatar
Expert Contributor

I have some groovy code which works:

session = Hadoop.login( "https://etc-lab1-edge01-10:8888/gateway/ehihadoop02", "guest", "guest-password" )
text = Hdfs.ls( session ).dir( "/tmp/guest" ).now().string
import groovy.json.JsonSlurper
json = (new JsonSlurper()).parseText( text )
println json.FileStatuses.FileStatus.pathSuffix

How do I do the same thing in Java? I've tried the following:

String webHdfsUrl = "webhdfs://etc-lab1-edge01-10:8888/";
String dir = "/tmp/guest";
Configuration hdfsConfig = new Configuration();
FileSystem fs = FileSystem.get(URI.create(webHdfsUrl), hdfsConfig);
RemoteIterator<LocatedFileStatus> files = fs.listFiles(new Path(dir), false);
while (files.hasNext()) {
	LocatedFileStatus srcFile = files.next();
	String path = Path.getPathWithoutSchemeAndAuthority(srcFile.getPath()).toString();
	System.out.println(path);
}

But I get java.net.SocketException: Unexpected end of file from server.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

First problem with my code was that I should be trying to use SECURE webhdfs (still not sure why I was getting a SocketException). The protocol for secure webhdfs should be swebhdfs.

Second problem was an SSLHandshakeException. I used the VM arg -Djavax.net.debug=all and realized Java didn't trust the SSL connection and we were using a self-signed certificate. So I exported the cert from Chrome and imported it into a new truststore. Adding a VM arg to point to the new truststore seemed to fix that -Djavax.net.ssl.trustStore=C:\Apps\truststore.

-----------

After some more research, looks like using the Java API through Knox is not supported in 0.6.0. More info here.

View solution in original post

13 REPLIES 13

avatar

Sorry it took me a while to response here but I was putting together a working sample. The first important point is that I think people tend to overestimate the complexity of dealing with the REST APIs, especially WebHDFS. The point of having REST APIs after all is supposed to be very thin clients. I played with a few different Java HTTP client libraries and to my surprise the venerable Java HttpsUrlConnection resulted in the cleanest examples. The Apache HttpClient is certainly an option and might be warranted in more complex situations.

IMPORTANT: Before you continue however please note that these examples are setup to circumvent both SSL hostname and certificate validation. This is not acceptable in production but often helps in samples to make sure they don't become a barrier to success.

I'll show the heart of the solution below but the full answer can be found here: https://github.com/kminder/knox-webhdfs-client-examples. Specifically here: https://github.com/kminder/knox-webhdfs-client-examples/blob/master/src/test/java/net/minder/KnoxWeb...

Now for the code. The first is an example of the simplest of operations: GETHOMEDIRECTORY.

@Test
public void getHomeDirExample() throws Exception {
  HttpsURLConnection connection;
  InputStream input;
  JsonNode json;
  connection = createHttpUrlConnection( WEBHDFS_URL + "?op=GETHOMEDIRECTORY" );
  input = connection.getInputStream();
  json = MAPPER.readTree( input );
  input.close();
  connection.disconnect();
  assertThat( json.get( "Path" ).asText(), is( "/user/"+TEST_USERNAME ) );
}

Next a more complicated sample that writes and reads a file to HDFS via the CREATE and OPEN operations.

@Test
public void putGetFileExample() throws Exception {
  HttpsURLConnection connection;
  String redirect;
  InputStream input;
  OutputStream output;

  String data = UUID.randomUUID().toString();

  connection = createHttpUrlConnection( WEBHDFS_URL + "/tmp/" + data + "/?op=CREATE" );
  connection.setRequestMethod( "PUT" );
  assertThat( connection.getResponseCode(), is(307) );
  redirect = connection.getHeaderField( "Location" );
  connection.disconnect();

  connection = createHttpUrlConnection( redirect );
  connection.setRequestMethod( "PUT" );
  connection.setDoOutput( true );
  output = connection.getOutputStream();
  IOUtils.write( data.getBytes(), output );
  output.close();
  connection.disconnect();
  assertThat( connection.getResponseCode(), is(201) );

  connection = createHttpUrlConnection( WEBHDFS_URL + "/tmp/" + data + "/?op=OPEN" );
  assertThat( connection.getResponseCode(), is(307) );
  redirect = connection.getHeaderField( "Location" );
  connection.disconnect();

  connection = createHttpUrlConnection( redirect );
  input = connection.getInputStream();
  assertThat( IOUtils.toString( input ), is( data ) );
  input.close();
  connection.disconnect();
}

Now of course you have probably noticed that all of the "magic" is hidden in that createHttpUrlConnection method. Not really magic at all but this is where the "un-securing" of SSL happens. This also takes care of setting up HTTP BasicAuth for authentication and disables redirects which should be done when using the WebHDFS REST APIs.

private HttpsURLConnection createHttpUrlConnection( URL url ) throws Exception {
  HttpsURLConnection conn = (HttpsURLConnection)url.openConnection();
  conn.setHostnameVerifier( new TrustAllHosts() );
  conn.setSSLSocketFactory( TrustAllCerts.createInsecureSslContext().getSocketFactory() );
  conn.setInstanceFollowRedirects( false );
  String credentials = TEST_USERNAME + ":" + TEST_PASSWORD;
  conn.setRequestProperty( "Authorization", "Basic " + DatatypeConverter.printBase64Binary(credentials.getBytes() ) );
  return conn;
}

private HttpsURLConnection createHttpUrlConnection( String url ) throws Exception {
  return createHttpUrlConnection( new URL( url ) );
}

avatar
Master Mentor

avatar
Expert Contributor

@Kevin Minder This is awesome! Thank you!!

avatar
New Contributor

@Kit Menke

Could you please let me know if you were able to call knox webhdfs via JAVA. I am trying same but I get the following exception at

Hdfs.ls( session ).dir( "/" ).now().getString()

log4j:WARN No appenders could be found for logger (org.apache.http.client.protocol.RequestAddCookies).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.knox.gateway.shell.HadoopException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at org.apache.knox.gateway.shell.AbstractRequest.now(AbstractRequest.java:85)
at webhdfsclient.program.main(program.java:23)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:961)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:396)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.apache.knox.gateway.shell.Hadoop.executeNow(Hadoop.java:256)
at org.apache.knox.gateway.shell.AbstractRequest.execute(AbstractRequest.java:50)
at org.apache.knox.gateway.shell.hdfs.Ls$Request.access$200(Ls.java:31)
at org.apache.knox.gateway.shell.hdfs.Ls$Request$1.call(Ls.java:51)
at org.apache.knox.gateway.shell.hdfs.Ls$Request$1.call(Ls.java:45)
at org.apache.knox.gateway.shell.AbstractRequest.now(AbstractRequest.java:83)
... 1 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302)
at sun.security.validator.Validator.validate(Validator.java:260)
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
... 25 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392)
... 31 more