03-28-2017 08:29 AM
Our shop uses AWS roles, not Secret keys. So how do I properly provide my credentials for a distCP operation:
my role credentials look like this:
my source files are at :
/tmp/some_data_event_file and my destination is
How do I use this within the context of moving files from Cloudera HDFS to S3? We are on
My starting point is this command
hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://user/hdfs/mydata s3a://myBucket/mydata_backup
What should I use in lieu of:
IOW if my model looks like this:
hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey hdfs://tmp/some_data_event_files/* s3a://bucket/path1/path2/
What do I replace -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey with to use the role unstead of individual acct key.
03-31-2017 08:47 AM
Distcp is using the S3 API.
Per AWS  to interact with the S3 API in an authenticated fashion, you must have a signature value that is generated from an access key. Your starting point command is the only way to do this copy using distcp based on what i'm seeing here.
If having a permanent Access key is a problem, AWS provides a way of generating a temporary key.