Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

access amazon S3 bucket from hdfs

Solved Go to solution

access amazon S3 bucket from hdfs

New Contributor

I am trying to connect amazon S3 bucket from hdfs using this command:

$ hadoop fs -ls s3n://<ACCESSKEYID>:<SecretAccessKey>@<bucket-name>/tpt_files/

-ls: Invalid hostname in URI s3n://<ACCESSKEYID>:<SecretAccessKey>@<bucket-name>/tpt_files

Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [<path> ...]

My SecretAccessKey includes “/”. Could it be cause of such behavior?

In the same time I have aws cli installed in this server and I can access by bucket using aws without any issues using command (AccessKeyId and SecretAccessKey configured in .aws/credentials)

aws s3 ls s3:// <bucket-name>/tpt_files/

If there any way how to access amazon S3 bucket using Hadoop command without specifying Keys in core-site.xml. I’d prefer to specify Keys in command line.

Any suggestions will be very helpful.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: access amazon S3 bucket from hdfs

New Contributor

Hi All,

Eventually I've found the way how to specify Keys on command line:

hadoop fs -Dfs.s3a.access.key=<AccessKeyId> -Dfs.s3a.secret.key=<SecurityAccessKey> -Dfs.s3a.proxy.host=<proxy_host> -Dfs.s3a.proxy.port=<proxy_port> -ls s3a://<my_bucket/

Thanks to Constantin and kvarakantham for their responses.

.

4 REPLIES 4

Re: access amazon S3 bucket from hdfs

@Leonid Zavadskiy

You are dealing with this issue: https://issues.apache.org/jira/browse/HADOOP-3733

As a workaround you can set first the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties then the URI would be S3:/mybucket/dest.

Putting things on the command line is not very secure anyway.

Re: access amazon S3 bucket from hdfs

New Contributor

Thank you Constantin,

Yes, putting Keys on the command line is not very secure. I am agree with you. But if I will set AccessKeyId and SecureAccessKey in core-site.xml, than all hadoop users will be able to access amazon s3 bucket from hadoop. I am trying to avoid this scenario.

I am playing with putting Keys on command line, but still not successful with it...

Not sure what cause of error - syntax seems OK (now I am trying s3a instead of s3n).

Re: access amazon S3 bucket from hdfs

New Contributor

Hi All,

Eventually I've found the way how to specify Keys on command line:

hadoop fs -Dfs.s3a.access.key=<AccessKeyId> -Dfs.s3a.secret.key=<SecurityAccessKey> -Dfs.s3a.proxy.host=<proxy_host> -Dfs.s3a.proxy.port=<proxy_port> -ls s3a://<my_bucket/

Thanks to Constantin and kvarakantham for their responses.

.

Highlighted

Re: access amazon S3 bucket from hdfs

New Contributor

step1:.add this two property file into core-site.xml file.

<property>

<name>fs.s3a.access.key</name>

<value>your aws IAM user access key</value>

</property>

<property>

<name>fs.s3a.secret.key</name>

<value>your aws IAM user secret key</value>

</property>

step2: add s3 bucket endpoint property file into core-site.xml.before you add check s3 bucket region.

for example my bucket in mumbai location:https://s3.ap-south1.amazonaws.com/bucketname/foldername/filename.csv

<property>

<name>fs.s3a.endpoint</name>

<value> s3,bucket.locatoon </value>

s3.ap-south1.amazonaws.com

</property>

Note:otherwise you get 400 Bad Request WARN s3a.S3AFileSystem:Client: Amazon S3 error 400: 400 Bad Request; Bad Request com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code:400; Error Code:400 Bad Request;

step 3.add hadoop.security.credential.provider.path property file into core-site.xml.for this use can add access.key and secret.key file on hdfs path(hadoop credential API to store AWS secrets.).

example:these commands run as

I: hdfs hdfs dfs -chown s3_acces:hdfs /user/s3_access

II: hadoop credential create fs.s3a.access.key -value aws-IAM-user_accesskey - / provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks.

III:hadoop credential create fs.s3a.secret.key -value aws-IAM-user_secretkey -provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks

IV. hadoop credential list -provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks

you will get output as below:

Listing aliases for CredentialProvider:

jceks://hdfs@13.229.32.224:8020/user/s3_access/s3.jceks

fs.s3a.secret.key

fs.s3a.access.key

finally you craeted store AWS secrets credential on hadoop]

hdfs dfs -chowm s3_acces:hdfs /user/s3_access/s3.jceks

hdfs dfs -chmod 666 /user/s3_access/s3.jceks

<property>

<name>hadoop.security.credential.provider.path</name> <value>jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks</value>

</property>

step 4:restart ambari-server:

ambari-server restart

hadoop fs -ls s3a://yourbucketname/folder/file.csv

hadoop distcp s3a://yourbucketname/foldername/filename.csv hdfs://10.22.121.0:8020/you hdfc folder

flollow this link:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP2.6.2/bk_cloud-data-access/content/s3-config-props...