Support Questions
Find answers, ask questions, and share your expertise

hadoop distcp not working

New Contributor

Hi,

We are trying to run the hadoop distcp from the command line to fetch various files from the Amazon s3 bucket using the s3a scheme to a hdfs path. We are hitting the following issue:

16/03/27 13:02:39 ERROR tools.DistCp: Exception encountered com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:76) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382) at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153) at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)

The required credentials are passed from the command line using -Dfs.s3a.awsAccessKeyId and -Dfs.s3a.awsSecretAccessKey variables.

Is there any limitaiton in working s3a protocol as just s3n scheme seems to be working fine?

This is on hadoop 2.7.1 and hortonworks 2.3.

Regards,

Shanmuga

2 REPLIES 2

Re: hadoop distcp not working

Mentor

Can you try adding space between -D and fs properties.

Re: hadoop distcp not working

Hello @Shanmuga Sundaram.

As per Apache documentation, the correct configuration properties are fs.s3a.access.key and fs.s3a.secret.key.

<property>
  <name>fs.s3a.access.key</name>
  <description>AWS access key ID. Omit for Role-based authentication.</description>
</property>

<property>
  <name>fs.s3a.secret.key</name>
  <description>AWS secret key. Omit for Role-based authentication.</description>
</property>