Support Questions
Find answers, ask questions, and share your expertise

access amazon S3 bucket from hdfs

Solved Go to solution

access amazon S3 bucket from hdfs

Explorer

I am trying to connect amazon S3 bucket from hdfs using this command:

$ hadoop fs -ls s3n://<ACCESSKEYID>:<SecretAccessKey>@<bucket-name>/tpt_files/

-ls: Invalid hostname in URI s3n://<ACCESSKEYID>:<SecretAccessKey>@<bucket-name>/tpt_files

Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [<path> ...]

My SecretAccessKey includes “/”. Could it be cause of such behavior?

In the same time I have aws cli installed in this server and I can access by bucket using aws without any issues using command (AccessKeyId and SecretAccessKey configured in .aws/credentials)

aws s3 ls s3:// <bucket-name>/tpt_files/

If there any way how to access amazon S3 bucket using Hadoop command without specifying Keys in core-site.xml. I’d prefer to specify Keys in command line.

Any suggestions will be very helpful.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: access amazon S3 bucket from hdfs

Explorer

Hi All,

Eventually I've found the way how to specify Keys on command line:

hadoop fs -Dfs.s3a.access.key=<AccessKeyId> -Dfs.s3a.secret.key=<SecurityAccessKey> -Dfs.s3a.proxy.host=<proxy_host> -Dfs.s3a.proxy.port=<proxy_port> -ls s3a://<my_bucket/

Thanks to Constantin and kvarakantham for their responses.

.

View solution in original post

4 REPLIES 4

Re: access amazon S3 bucket from hdfs

@Leonid Zavadskiy

You are dealing with this issue: https://issues.apache.org/jira/browse/HADOOP-3733

As a workaround you can set first the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties then the URI would be S3:/mybucket/dest.

Putting things on the command line is not very secure anyway.

Re: access amazon S3 bucket from hdfs

Explorer

Thank you Constantin,

Yes, putting Keys on the command line is not very secure. I am agree with you. But if I will set AccessKeyId and SecureAccessKey in core-site.xml, than all hadoop users will be able to access amazon s3 bucket from hadoop. I am trying to avoid this scenario.

I am playing with putting Keys on command line, but still not successful with it...

Not sure what cause of error - syntax seems OK (now I am trying s3a instead of s3n).

Re: access amazon S3 bucket from hdfs

Explorer

Hi All,

Eventually I've found the way how to specify Keys on command line:

hadoop fs -Dfs.s3a.access.key=<AccessKeyId> -Dfs.s3a.secret.key=<SecurityAccessKey> -Dfs.s3a.proxy.host=<proxy_host> -Dfs.s3a.proxy.port=<proxy_port> -ls s3a://<my_bucket/

Thanks to Constantin and kvarakantham for their responses.

.

View solution in original post

Re: access amazon S3 bucket from hdfs

step1:.add this two property file into core-site.xml file.

<property>

<name>fs.s3a.access.key</name>

<value>your aws IAM user access key</value>

</property>

<property>

<name>fs.s3a.secret.key</name>

<value>your aws IAM user secret key</value>

</property>

step2: add s3 bucket endpoint property file into core-site.xml.before you add check s3 bucket region.

for example my bucket in mumbai location:https://s3.ap-south1.amazonaws.com/bucketname/foldername/filename.csv

<property>

<name>fs.s3a.endpoint</name>

<value> s3,bucket.locatoon </value>

s3.ap-south1.amazonaws.com

</property>

Note:otherwise you get 400 Bad Request WARN s3a.S3AFileSystem:Client: Amazon S3 error 400: 400 Bad Request; Bad Request com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code:400; Error Code:400 Bad Request;

step 3.add hadoop.security.credential.provider.path property file into core-site.xml.for this use can add access.key and secret.key file on hdfs path(hadoop credential API to store AWS secrets.).

example:these commands run as

I: hdfs hdfs dfs -chown s3_acces:hdfs /user/s3_access

II: hadoop credential create fs.s3a.access.key -value aws-IAM-user_accesskey - / provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks.

III:hadoop credential create fs.s3a.secret.key -value aws-IAM-user_secretkey -provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks

IV. hadoop credential list -provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks

you will get output as below:

Listing aliases for CredentialProvider:

jceks://hdfs@13.229.32.224:8020/user/s3_access/s3.jceks

fs.s3a.secret.key

fs.s3a.access.key

finally you craeted store AWS secrets credential on hadoop]

hdfs dfs -chowm s3_acces:hdfs /user/s3_access/s3.jceks

hdfs dfs -chmod 666 /user/s3_access/s3.jceks

<property>

<name>hadoop.security.credential.provider.path</name> <value>jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks</value>

</property>

step 4:restart ambari-server:

ambari-server restart

hadoop fs -ls s3a://yourbucketname/folder/file.csv

hadoop distcp s3a://yourbucketname/foldername/filename.csv hdfs://10.22.121.0:8020/you hdfc folder

flollow this link:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP2.6.2/bk_cloud-data-access/content/s3-config-props...