Support Questions

Find answers, ask questions, and share your expertise

Using AWS roles with distCP command Cloudera HDFS to Amazon S3


Our shop uses AWS roles, not Secret keys.  So how do I properly provide my credentials for a distCP operation:


my role credentials look like this:

aws_iam_role=arn:aws:iam::<acct #>:role/rolename


my source files are at :

/tmp/some_data_event_file  and my destination is 



How do I use this within the context of moving files from Cloudera HDFS to S3?  We are on 

Hadoop 2.6.0-cdh5.10.0


My starting point is this command

hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://user/hdfs/mydata s3a://myBucket/mydata_backup


What should I use in lieu of:

-Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey


IOW if my model looks like this:

hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://tmp/some_data_event_files/*    s3a://bucket/path1/path2/  


What do I replace -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey with to use the role unstead of individual acct key.



Expert Contributor

Distcp is using the S3 API.


Per AWS [1] to interact with the S3 API in an authenticated fashion, you must have a signature value that is generated from an access key. Your starting point command is the only way to do this copy using distcp based on what i'm seeing here. 


If having a permanent Access key is a problem, AWS provides a way of generating a temporary key.[2]





Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.