- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
access amazon S3 bucket from hdfs
- Labels:
-
Apache Hadoop
Created 07-15-2016 08:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to connect amazon S3 bucket from hdfs using this command:
$ hadoop fs -ls s3n://<ACCESSKEYID>:<SecretAccessKey>@<bucket-name>/tpt_files/
-ls: Invalid hostname in URI s3n://<ACCESSKEYID>:<SecretAccessKey>@<bucket-name>/tpt_files
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [<path> ...]
My SecretAccessKey includes “/”. Could it be cause of such behavior?
In the same time I have aws cli installed in this server and I can access by bucket using aws without any issues using command (AccessKeyId and SecretAccessKey configured in .aws/credentials)
aws s3 ls s3:// <bucket-name>/tpt_files/
If there any way how to access amazon S3 bucket using Hadoop command without specifying Keys in core-site.xml. I’d prefer to specify Keys in command line.
Any suggestions will be very helpful.
Created 07-16-2016 01:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Eventually I've found the way how to specify Keys on command line:
hadoop fs -Dfs.s3a.access.key=<AccessKeyId> -Dfs.s3a.secret.key=<SecurityAccessKey> -Dfs.s3a.proxy.host=<proxy_host> -Dfs.s3a.proxy.port=<proxy_port> -ls s3a://<my_bucket/
Thanks to Constantin and kvarakantham for their responses.
.
Created 07-15-2016 10:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are dealing with this issue: https://issues.apache.org/jira/browse/HADOOP-3733
As a workaround you can set first the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties then the URI would be S3:/mybucket/dest.
Putting things on the command line is not very secure anyway.
Created 07-16-2016 12:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Constantin,
Yes, putting Keys on the command line is not very secure. I am agree with you. But if I will set AccessKeyId and SecureAccessKey in core-site.xml, than all hadoop users will be able to access amazon s3 bucket from hadoop. I am trying to avoid this scenario.
I am playing with putting Keys on command line, but still not successful with it...
Not sure what cause of error - syntax seems OK (now I am trying s3a instead of s3n).
Created 07-16-2016 01:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Eventually I've found the way how to specify Keys on command line:
hadoop fs -Dfs.s3a.access.key=<AccessKeyId> -Dfs.s3a.secret.key=<SecurityAccessKey> -Dfs.s3a.proxy.host=<proxy_host> -Dfs.s3a.proxy.port=<proxy_port> -ls s3a://<my_bucket/
Thanks to Constantin and kvarakantham for their responses.
.
Created 04-05-2018 05:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
step1:.add this two property file into core-site.xml file.
<property>
<name>fs.s3a.access.key</name>
<value>your aws IAM user access key</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>your aws IAM user secret key</value>
</property>
step2: add s3 bucket endpoint property file into core-site.xml.before you add check s3 bucket region.
for example my bucket in mumbai location:https://s3.ap-south1.amazonaws.com/bucketname/foldername/filename.csv
<property>
<name>fs.s3a.endpoint</name>
<value> s3,bucket.locatoon </value>
s3.ap-south1.amazonaws.com
</property>
Note:otherwise you get 400 Bad Request WARN s3a.S3AFileSystem:Client: Amazon S3 error 400: 400 Bad Request; Bad Request com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code:400; Error Code:400 Bad Request;
step 3.add hadoop.security.credential.provider.path property file into core-site.xml.for this use can add access.key and secret.key file on hdfs path(hadoop credential API to store AWS secrets.).
example:these commands run as
I: hdfs hdfs dfs -chown s3_acces:hdfs /user/s3_access
II: hadoop credential create fs.s3a.access.key -value aws-IAM-user_accesskey - / provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks.
III:hadoop credential create fs.s3a.secret.key -value aws-IAM-user_secretkey -provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks
IV. hadoop credential list -provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks
you will get output as below:
Listing aliases for CredentialProvider:
jceks://hdfs@13.229.32.224:8020/user/s3_access/s3.jceks
fs.s3a.secret.key
fs.s3a.access.key
finally you craeted store AWS secrets credential on hadoop]
hdfs dfs -chowm s3_acces:hdfs /user/s3_access/s3.jceks
hdfs dfs -chmod 666 /user/s3_access/s3.jceks
<property>
<name>hadoop.security.credential.provider.path</name> <value>jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks</value>
</property>
step 4:restart ambari-server:
ambari-server restart
hadoop fs -ls s3a://yourbucketname/folder/file.csv
hadoop distcp s3a://yourbucketname/foldername/filename.csv hdfs://10.22.121.0:8020/you hdfc folder
flollow this link: