Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

access amazon S3 bucket from hdfs

avatar
New Member

I am trying to connect amazon S3 bucket from hdfs using this command:

$ hadoop fs -ls s3n://<ACCESSKEYID>:<SecretAccessKey>@<bucket-name>/tpt_files/

-ls: Invalid hostname in URI s3n://<ACCESSKEYID>:<SecretAccessKey>@<bucket-name>/tpt_files

Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [<path> ...]

My SecretAccessKey includes “/”. Could it be cause of such behavior?

In the same time I have aws cli installed in this server and I can access by bucket using aws without any issues using command (AccessKeyId and SecretAccessKey configured in .aws/credentials)

aws s3 ls s3:// <bucket-name>/tpt_files/

If there any way how to access amazon S3 bucket using Hadoop command without specifying Keys in core-site.xml. I’d prefer to specify Keys in command line.

Any suggestions will be very helpful.

1 ACCEPTED SOLUTION

avatar
New Member

Hi All,

Eventually I've found the way how to specify Keys on command line:

hadoop fs -Dfs.s3a.access.key=<AccessKeyId> -Dfs.s3a.secret.key=<SecurityAccessKey> -Dfs.s3a.proxy.host=<proxy_host> -Dfs.s3a.proxy.port=<proxy_port> -ls s3a://<my_bucket/

Thanks to Constantin and kvarakantham for their responses.

.

View solution in original post

4 REPLIES 4

avatar
Super Guru

@Leonid Zavadskiy

You are dealing with this issue: https://issues.apache.org/jira/browse/HADOOP-3733

As a workaround you can set first the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties then the URI would be S3:/mybucket/dest.

Putting things on the command line is not very secure anyway.

avatar
New Member

Thank you Constantin,

Yes, putting Keys on the command line is not very secure. I am agree with you. But if I will set AccessKeyId and SecureAccessKey in core-site.xml, than all hadoop users will be able to access amazon s3 bucket from hadoop. I am trying to avoid this scenario.

I am playing with putting Keys on command line, but still not successful with it...

Not sure what cause of error - syntax seems OK (now I am trying s3a instead of s3n).

avatar
New Member

Hi All,

Eventually I've found the way how to specify Keys on command line:

hadoop fs -Dfs.s3a.access.key=<AccessKeyId> -Dfs.s3a.secret.key=<SecurityAccessKey> -Dfs.s3a.proxy.host=<proxy_host> -Dfs.s3a.proxy.port=<proxy_port> -ls s3a://<my_bucket/

Thanks to Constantin and kvarakantham for their responses.

.

avatar

step1:.add this two property file into core-site.xml file.

<property>

<name>fs.s3a.access.key</name>

<value>your aws IAM user access key</value>

</property>

<property>

<name>fs.s3a.secret.key</name>

<value>your aws IAM user secret key</value>

</property>

step2: add s3 bucket endpoint property file into core-site.xml.before you add check s3 bucket region.

for example my bucket in mumbai location:https://s3.ap-south1.amazonaws.com/bucketname/foldername/filename.csv

<property>

<name>fs.s3a.endpoint</name>

<value> s3,bucket.locatoon </value>

s3.ap-south1.amazonaws.com

</property>

Note:otherwise you get 400 Bad Request WARN s3a.S3AFileSystem:Client: Amazon S3 error 400: 400 Bad Request; Bad Request com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code:400; Error Code:400 Bad Request;

step 3.add hadoop.security.credential.provider.path property file into core-site.xml.for this use can add access.key and secret.key file on hdfs path(hadoop credential API to store AWS secrets.).

example:these commands run as

I: hdfs hdfs dfs -chown s3_acces:hdfs /user/s3_access

II: hadoop credential create fs.s3a.access.key -value aws-IAM-user_accesskey - / provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks.

III:hadoop credential create fs.s3a.secret.key -value aws-IAM-user_secretkey -provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks

IV. hadoop credential list -provider jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks

you will get output as below:

Listing aliases for CredentialProvider:

jceks://hdfs@13.229.32.224:8020/user/s3_access/s3.jceks

fs.s3a.secret.key

fs.s3a.access.key

finally you craeted store AWS secrets credential on hadoop]

hdfs dfs -chowm s3_acces:hdfs /user/s3_access/s3.jceks

hdfs dfs -chmod 666 /user/s3_access/s3.jceks

<property>

<name>hadoop.security.credential.provider.path</name> <value>jceks://hdfs@10.22.121.0:8020/user/s3_access/s3.jceks</value>

</property>

step 4:restart ambari-server:

ambari-server restart

hadoop fs -ls s3a://yourbucketname/folder/file.csv

hadoop distcp s3a://yourbucketname/foldername/filename.csv hdfs://10.22.121.0:8020/you hdfc folder

flollow this link:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP2.6.2/bk_cloud-data-access/content/s3-config-props...