Support Questions

zack_riesland · ‎11-10-2016

I'm trying to use distcp to copy data to an S3 bucket, and experiencing nothing but pain.

I've tried something like this:

sudo -u hdfs hadoop distcp -Dhadoop.root.logger="DEBUG,console" -Dmapreduce.job.maxtaskfailures.per.tracker=1 -bandwidth 10 -i -log /user/hdfs/s3_staging/logging/distcp.log hdfs:///apps/hive/warehouse/my_db/my_table s3n://my_bucket/my_path

But I encounter this error:

http://stackoverflow.com/questions/37868404/distcp-from-hadoop-to-s3-fails-with-no-space-available-i...

From what I've read, I might have more luck trying s3a instead of s3n, but when I try the same command above using "s3a" in the URL, I get this error:

"No FileSystem for scheme: S3a"

Can someone please give me some insight to get this working with either file system

zack_riesland · ‎11-11-2016

I figured it out - I needed to add fs.s3a.access.key and fs.s3a.secret.key values to my HDFS config in Ambari.

I already had fs.s3.awsAccessKeyId and fs.s3.awsSecretKeyId, but those are just for s3:// urls, apparently.

So I had to do the following to get distcp to work on HDP 2.4.2:

Add aws-java-sdk-s3-1.10.62.jar to hadoop/lib on the node running the command

Add hadoop/lib* to the classpath for MapReduce and Yarn

Add fs.s3a.access.key and fs.s3a.secret.key properties to HDFS config in Ambari.

View solution in original post

rbalamohan · ‎11-11-2016

s3n is pretty much deprecated. Please use "s3a". Which version of HDP are you using? Check if you have relevant s3a libraries (aws-java-sdk-s3*.jar) in hadoop and add "-Dfs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"

zack_riesland · ‎11-11-2016

Thanks @Rajesh Balamohan

I see that I only had aws-java-sdk-s3*.jar under /usr/hdp/current/zeppelin/lib/lib, so I copied it to /usr/hdp/current/hadoop/lib and /usr/hdp/current/hadoop-mapreduce/lib, but when I try to run with the -Dfs.s3a.impl argument, I get the error below.

I have the proper AWS credentials in my config and I don't have credential-related issues if I try a s3n: URL, so I think this is really an issue finding the right jars.

Do I need to add that jar to a path somewhere?

Any ideas?

16/11/11 06:25:41 ERROR tools.DistCp: Invalid arguments:
com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
        at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
        at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
        at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:228)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:216)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:116)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Invalid arguments: Unable to load AWS credentials from any provider in the chain

zack_riesland · ‎11-11-2016

I figured it out - I needed to add fs.s3a.access.key and fs.s3a.secret.key values to my HDFS config in Ambari.

I already had fs.s3.awsAccessKeyId and fs.s3.awsSecretKeyId, but those are just for s3:// urls, apparently.

So I had to do the following to get distcp to work on HDP 2.4.2:

Add aws-java-sdk-s3-1.10.62.jar to hadoop/lib on the node running the command

Add hadoop/lib* to the classpath for MapReduce and Yarn

Add fs.s3a.access.key and fs.s3a.secret.key properties to HDFS config in Ambari.

zack_riesland · ‎11-11-2016

Oh. Also need this in HDFS configs:

fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

stevel · ‎11-29-2016

you need to set the s3a properties to log in; these are separate from the s3n ones

see: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-a...

see also: http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-troub...

Cloudera Community

Support Questions

How to use s3a with HDP