I am trying to copy data from hdfs to s3 While using the distcp command, the command works for individual files.
So, hadoop distcp /user/username/file.txt s3a://xxxxx works fine
But when I try to copy the entire director structure it fails to create the directory giving the error: Error: java.io.IOException: mkdir failed for s3a://bucket****/ Error Code: 403 Forbidden; Request ID: 447400E9C5995ED9), S3 Extended Request ID: T0hsw+XaBMrkMUhDcJBKGIRtSF58dKedZdCH2qC32v9uVkwR94SGiI7Xxe8lqaFaDyjwS3oCpkg=
Even if I do a simple mkdir command in s3 it fails giving the same 403 forbidden issue.
So not sure what is the root cause.
I am able to copy the files but not able to create directories.
You may not have all the permissions you need on the bucket. Make sure you have "s3:ListBucket" on resource "arn:aws:s3:::<bucket-name>", as well as "s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:PutObjectAcl" on resource "arn:aws:s3:::<bucket-name>/*"
You probably also want to add the permissions for creating and deleting multipart uploads. I'd add priviliges for the operations mentioned in the multipart upload docs as well.
How did you copy hdfs files to s3 ? using hadoop Distcp ?
can u please tell me what configuration did you do ? other than passing access and secret key in core-site.xml
Also can u please tell me if you know ? how to sync directories from hdfs to s3 like below :
hdfs://home/test/abc.txt ----> s3://bucket/test/abc.txt
You may want to take a look at the doc here where provides some distcp examples:
Thanks and hope it helps,