When running a distcp process from HDFS to AWS S3, credentials are required to authenticate to the S3 bucket. Passing these into the S3A URI would leak secret values into application logs. Storing these secrets in core-site.xml is also not ideal because this means any user with hdfs CLI access can access the S3 bucket to which these AWS credentials are tied.
The Hadoop Credential API can be used to manage access to S3 in a more fine-grained way.
The first step is to create a local JCEKS file in which to store the AWS Access Key and AWS Secret Key values:
hadoop credential create fs.s3a.access.key -provider localjceks://file/path/to/aws.jceks
<enter Access Key value at prompt>
hadoop credential create fs.s3a.secret.key -provider localjceks://file/path/to/aws.jceks
<enter Secret Key value at prompt>
We'll then copy this JCEKS file to HDFS with the appropriate permissions.