Reply
Explorer
Posts: 8
Registered: ‎03-25-2017

Using AWS roles with distCP command Cloudera HDFS to Amazon S3

Our shop uses AWS roles, not Secret keys.  So how do I properly provide my credentials for a distCP operation:

 

my role credentials look like this:

aws_iam_role=arn:aws:iam::<acct #>:role/rolename

 

my source files are at :

/tmp/some_data_event_file  and my destination is 

s3://bucket/path1/path2/

 

How do I use this within the context of moving files from Cloudera HDFS to S3?  We are on 

Hadoop 2.6.0-cdh5.10.0

 

My starting point is this command

hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://user/hdfs/mydata s3a://myBucket/mydata_backup

 

What should I use in lieu of:

-Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey

 

IOW if my model looks like this:

hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://tmp/some_data_event_files/*    s3a://bucket/path1/path2/  

 

What do I replace -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey with to use the role unstead of individual acct key.

 

Expert Contributor
Posts: 101
Registered: ‎01-24-2014

Re: Using AWS roles with distCP command Cloudera HDFS to Amazon S3

Distcp is using the S3 API.

 

Per AWS [1] to interact with the S3 API in an authenticated fashion, you must have a signature value that is generated from an access key. Your starting point command is the only way to do this copy using distcp based on what i'm seeing here. 

 

If having a permanent Access key is a problem, AWS provides a way of generating a temporary key.[2]

 

[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/MakingRequests.html

[2] http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html

 

Announcements