Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Using AWS roles with distCP command Cloudera HDFS to Amazon S3

Highlighted

Using AWS roles with distCP command Cloudera HDFS to Amazon S3

Explorer

Our shop uses AWS roles, not Secret keys.  So how do I properly provide my credentials for a distCP operation:

 

my role credentials look like this:

aws_iam_role=arn:aws:iam::<acct #>:role/rolename

 

my source files are at :

/tmp/some_data_event_file  and my destination is 

s3://bucket/path1/path2/

 

How do I use this within the context of moving files from Cloudera HDFS to S3?  We are on 

Hadoop 2.6.0-cdh5.10.0

 

My starting point is this command

hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://user/hdfs/mydata s3a://myBucket/mydata_backup

 

What should I use in lieu of:

-Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey

 

IOW if my model looks like this:

hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://tmp/some_data_event_files/*    s3a://bucket/path1/path2/  

 

What do I replace -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey with to use the role unstead of individual acct key.

 

1 REPLY 1

Re: Using AWS roles with distCP command Cloudera HDFS to Amazon S3

Expert Contributor

Distcp is using the S3 API.

 

Per AWS [1] to interact with the S3 API in an authenticated fashion, you must have a signature value that is generated from an access key. Your starting point command is the only way to do this copy using distcp based on what i'm seeing here. 

 

If having a permanent Access key is a problem, AWS provides a way of generating a temporary key.[2]

 

[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/MakingRequests.html

[2] http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html