I'm trying to access an S3 buckets using the HDFS utilities like below:
hdfs dfs -ls s3a://[BUCKET_NAME]/
but I'm getting the error :
-ls: Fatal internal error com.cloudera.com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
On the gateway node where I'm running the command, I don't have an AWS instance profile attached, but I do have one attached on all datanodes and namenodes. Running this command from one of the datanodes or namenodes works successfully. Is there a way I can run this command using instance profiles (no stored access keys or credentials) only on datanodes and namenodes. The reason I'm doing this is that I don't want to allow for direct S3 access from the gateway node.
You can put the s3 credentials in the s3 URI, or you can just pass the parameters on the command line, which is what I prefer, eg:
hadoop fs -Dfs.s3a.access.key="" -Dfs.s3a.secret.key="" -ls s3a://bucket-name/
Its also worth knowing that if you run the command like I have given above, it will override any other settings that are defined in the cluster config, such as core-site.xml etc.