Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

distcp with s3 timed out - cdh 5.7

distcp with s3 timed out - cdh 5.7

New Contributor

Hello everyone,

Just wondering is there any known issue with distcp with s3? We are trying to distcp some data from HDFS to S3 and we are getting the follow error:

 

Error: org.apache.http.conn.ConnectTimeoutException: Connect to testabcd.s3.amazonaws.com:443 timed out
at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:416)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:151)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:125)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:334)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:942)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2148)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2075)
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1093)
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:548)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:174)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at org.apache.hadoop.fs.s3native.$Proxy14.retrieveMetadata(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:472)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1412)
at org.apache.hadoop.tools.mapred.CopyMapper.setup(CopyMapper.java:114)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 

It's weird that we do get file already exists error if the folder is exist, so that means it's able to make connection but when it comes to copy, it fails. Not only distcp, we also tried to dump data on S3 through distcp, same error.

 

Is there any particular setting we need to enable it? We tried with HDP in the same AWS VPC and it worked. So may be we are missing some config here.

 

Any helps would be higly appreciated.

 

Thanks.

2 REPLIES 2

Re: distcp with s3 timed out - cdh 5.7

Rising Star

I am using Spark 1.6.0 in CDH 5.7.0, and I am having all kinds of issues using the AWS libraries that come with it. I would like to know the answer you get too.

 

Cheers,

Ben

Re: distcp with s3 timed out - cdh 5.7

Master Guru
Please try using S3A (s3a://) instead of S3/S3N (s3:// or s3n://) going forward in CDH as its powered by Amazon's own S3 Java SDK and supports more current abilities of S3 than the other older implementations. It is designed to replace the older ones.

Usage is straight-forward, with similar properties. Documentation is at http://www.cloudera.com/documentation/enterprise/latest/topics/spark_s3.html
Don't have an account?
Coming from Hortonworks? Activate your account here