Reply
Explorer
Posts: 20
Registered: ‎04-14-2015

Unable to access S3 bucket using S3A, HDFS CLI and Instance Profile

I'm trying to access an S3 buckets using the HDFS utilities like below:

 hdfs dfs -ls s3a://[BUCKET_NAME]/

but I'm getting the error :

-ls: Fatal internal error
com.cloudera.com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain

 

On the gateway node where I'm running the command, I don't have an AWS instance profile attached, but I do have one attached on all datanodes and namenodes.  Running this command from one of the datanodes or namenodes works successfully.  Is there a way I can run this command using instance profiles (no stored access keys or credentials) only on datanodes and namenodes.  The reason I'm doing this is that I don't want to allow for direct S3 access from the gateway node. 

 

Posts: 642
Topics: 3
Kudos: 121
Solutions: 67
Registered: ‎08-16-2016

Re: Unable to access S3 bucket using S3A, HDFS CLI and Instance Profile

The cmd will use the instance profile from where it is launched.

So if you want access but not for all you need to specify the key in the S3 URI.
Highlighted
Cloudera Employee
Posts: 16
Registered: ‎10-07-2015

Re: Unable to access S3 bucket using S3A, HDFS CLI and Instance Profile

You can put the s3 credentials in the s3 URI, or you can just pass the parameters on the command line, which is what I prefer, eg:

 

hadoop fs -Dfs.s3a.access.key="" -Dfs.s3a.secret.key="" -ls s3a://bucket-name/

Its also worth knowing that if you run the command like I have given above, it will override any other settings that are defined in the cluster config, such as core-site.xml etc.

Announcements