Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to use s3a with HDP

avatar
Super Collaborator

I'm trying to use distcp to copy data to an S3 bucket, and experiencing nothing but pain.

I've tried something like this:

sudo -u hdfs hadoop distcp -Dhadoop.root.logger="DEBUG,console" -Dmapreduce.job.maxtaskfailures.per.tracker=1 -bandwidth 10 -i -log /user/hdfs/s3_staging/logging/distcp.log hdfs:///apps/hive/warehouse/my_db/my_table s3n://my_bucket/my_path

But I encounter this error:

http://stackoverflow.com/questions/37868404/distcp-from-hadoop-to-s3-fails-with-no-space-available-i...

From what I've read, I might have more luck trying s3a instead of s3n, but when I try the same command above using "s3a" in the URL, I get this error:

"No FileSystem for scheme: S3a"

Can someone please give me some insight to get this working with either file system

1 ACCEPTED SOLUTION

avatar
Super Collaborator

I figured it out - I needed to add fs.s3a.access.key and fs.s3a.secret.key values to my HDFS config in Ambari.

I already had fs.s3.awsAccessKeyId and fs.s3.awsSecretKeyId, but those are just for s3:// urls, apparently.

So I had to do the following to get distcp to work on HDP 2.4.2:

Add aws-java-sdk-s3-1.10.62.jar to hadoop/lib on the node running the command

Add hadoop/lib* to the classpath for MapReduce and Yarn

Add fs.s3a.access.key and fs.s3a.secret.key properties to HDFS config in Ambari.

View solution in original post

5 REPLIES 5

avatar
Rising Star

s3n is pretty much deprecated. Please use "s3a". Which version of HDP are you using? Check if you have relevant s3a libraries (aws-java-sdk-s3*.jar) in hadoop and add "-Dfs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"

avatar
Super Collaborator

Thanks @Rajesh Balamohan

I see that I only had aws-java-sdk-s3*.jar under /usr/hdp/current/zeppelin/lib/lib, so I copied it to /usr/hdp/current/hadoop/lib and /usr/hdp/current/hadoop-mapreduce/lib, but when I try to run with the -Dfs.s3a.impl argument, I get the error below.

I have the proper AWS credentials in my config and I don't have credential-related issues if I try a s3n: URL, so I think this is really an issue finding the right jars.

Do I need to add that jar to a path somewhere?

Any ideas?

16/11/11 06:25:41 ERROR tools.DistCp: Invalid arguments:
com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
        at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
        at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
        at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:228)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:216)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:116)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Invalid arguments: Unable to load AWS credentials from any provider in the chain

avatar
Super Collaborator

I figured it out - I needed to add fs.s3a.access.key and fs.s3a.secret.key values to my HDFS config in Ambari.

I already had fs.s3.awsAccessKeyId and fs.s3.awsSecretKeyId, but those are just for s3:// urls, apparently.

So I had to do the following to get distcp to work on HDP 2.4.2:

Add aws-java-sdk-s3-1.10.62.jar to hadoop/lib on the node running the command

Add hadoop/lib* to the classpath for MapReduce and Yarn

Add fs.s3a.access.key and fs.s3a.secret.key properties to HDFS config in Ambari.

avatar
Super Collaborator

Oh. Also need this in HDFS configs:

fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

avatar