03-20-2018 01:34 AM
I am trying to copy data from hdfs to s3 While using the distcp command, the command works for individual files.
So, hadoop distcp /user/username/file.txt s3a://xxxxx works fine
But when I try to copy the entire director structure it fails to create the directory giving the error: Error: java.io.IOException: mkdir failed for s3a://bucket****/ Error Code: 403 Forbidden; Request ID: 447400E9C5995ED9), S3 Extended Request ID: T0hsw+XaBMrkMUhDcJBKGIRtSF58dKedZdCH2qC32v9uVkwR94SGiI7Xxe8lqaFaDyjwS3oCpkg=
Even if I do a simple mkdir command in s3 it fails giving the same 403 forbidden issue.
So not sure what is the root cause.
I am able to copy the files but not able to create directories.
05-03-2018 09:20 PM
You may not have all the permissions you need on the bucket. Make sure you have "s3:ListBucket" on resource "arn:aws:s3:::<bucket-name>", as well as "s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:PutObjectAcl" on resource "arn:aws:s3:::<bucket-name>/*"
You probably also want to add the permissions for creating and deleting multipart uploads. I'd add priviliges for the operations mentioned in the multipart upload docs as well.
02-04-2019 09:34 PM
How did you copy hdfs files to s3 ? using hadoop Distcp ?
can u please tell me what configuration did you do ? other than passing access and secret key in core-site.xml
Also can u please tell me if you know ? how to sync directories from hdfs to s3 like below :
hdfs://home/test/abc.txt ----> s3://bucket/test/abc.txt
02-05-2019 10:32 AM
You may want to take a look at the doc here where provides some distcp examples:
Thanks and hope it helps,