Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS Backup to AWS S3 without Keys error

avatar
Rising Star

Dear All,

We have been using hadoop distcp to backup hdfs data to AWS S3 via script in crontab & we have been using AWS keys with the distcp command to do the backup. Without AWS keys also it works but some times we are getting the timeout error and not reliable.

Is it mandatory to use the AWS keys along with hadoop distcp command or not? If not why i was getting the timeout/socket errors when i run without AWS keys? Manually tested few times and same result.

Command:

With Keys

hadoop distcp -Dfs.s3a.server-side-encryption-algorithm=AES256 -Dfs.s3a.access.key=${AWS_ACCESS_KEY_ID} -Dfs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY} -update hdfs://< HDFS dir>/ s3a://${BUCKET_NAME}/

Without Keys

hadoop distcp -Dfs.s3a.server-side-encryption-algorithm=AES256 -update hdfs://< HDFS dir>/ s3a://${BUCKET_NAME}/

Below is the error we get while running with out AWS keys.

""dfs.sh_20160630_010001:com.amazonaws.AmazonClientException: Unable to upload part: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 3C1FD2E8F503F052, AWS Error Code: RequestTimeout, AWS Error Message: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed."
1 ACCEPTED SOLUTION

avatar
Rising Star

Hi All,

At AWS end we need to provide the appropriate permissions for role based aws s3 backup. Now it is working.

Thank you for your valuable comments.

View solution in original post

3 REPLIES 3

avatar

@Muthukumar S : You need to either add the aws keys in the hadoop command or permanently add them in core-site.xml.

Are you able to do a hadoop fs -ls s3a://${BUCKET_NAME}/ [feel free to add keys accordingly] (This is to isolate authentication and connectivity issue)?

avatar
Rising Star

@Sandeep Nemuri

My requirement is below. I want to omit keys for Role based Authentication. Now AWS instance is assigned with the role but my hadoop distcp is not working if I provide the command without keys.

<property>
  <name>fs.s3a.access.key</name>
  <description>AWS access key ID. Omit for Role-based authentication.</description>
</property>

<property>
  <name>fs.s3a.secret.key</name>
  <description>AWS secret key. Omit for Role-based authentication.</description>
</property>

avatar
Rising Star

Hi All,

At AWS end we need to provide the appropriate permissions for role based aws s3 backup. Now it is working.

Thank you for your valuable comments.