Expert Contributor
Posts: 61
Registered: ‎02-03-2016

Setup Keystore for AWS Keys

Can someone get me started with setting up and using a Keystore (jceks file) in HDFS for use with Spark jobs in coordination with Hadoop Credentials Provider? I would like to store all my Amazon Access and Secret Keys there for retrieval. I see that in CDH 5.9 there is support for this now, but I don't see any way that it can be configured using Cloudera Manager. If anyone can get me pointed in the right direction, I would truly appreciate it.


In addition, if you know further details about integrating with Amazon Security Token Service (also new in CDH 5.9), please let me know.




New Contributor
Posts: 1
Registered: ‎12-08-2018

Re: Setup Keystore for AWS Keys

[ Edited ]
You can use one of the following methods described below to set up AWS credentials.
  • Set up AWS Credentials Using the Hadoop Credential Provider - Cloudera recommends you use this method to set up AWS access because it provides system-wide AWS access to a single predefined bucket, without exposing the secret key in a configuration file or having to specify it at runtime.
    1. Create the Hadoop credential provider file with the necessary access and secret keys:
      hadoop credential create fs.s3a.access.key -provider jceks://hdfs/<path_to_hdfs_file> -value <aws_access_id>

      For example:

      hadoop credential create fs.s3a.access.key -provider jceks://hdfs/user/root/awskeyfile.jceks -value AKI***********************
    2. Add the AWS secret key to the .jceks credential file.
      hadoop credential create fs.s3a.secret.key -provider jceks://hdfs/<path_to_hdfs_file> -value <aws_secret_key>

      For example:

      hadoop credential create fs.s3a.secret.key -provider jceks://hdfs/user/root/awskeyfile.jceks -value +pla**************************************************
    3. AWS access for users can be set up in two ways. You can either provide a global credential provider file that will allow all Spark users to submit S3 jobs, or have each user submit their own credentials every time they submit a job.
      • For Per-User Access - Provide the path to your specific credential store on the command line when submitting a Spark job. This means you do not need to modify the global settings for core-site.xml. Each user submitting a job can provide their own credentials at runtime as follows:
        spark-submit --conf ...
      • For System-Wide Access - Point to the Hadoop credential file created in the previous step using the Cloudera Manager Server:
        1. Login to the Cloudera Manager server.
        2. On the main page under Cluster, click on HDFS. Then click on Configuration. In the search box, enter core-site.
        3. Click on the + sign next to Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml. For Name, put and for Value put jceks://hdfs/path_to_hdfs_file. For example, jceks://hdfs/user/root/awskeyfile.jceks.
        4. Click Save Changes and deploy the client configuration to all nodes of the cluster.

          After the services restart, you can use AWS filesystem with credentials supplied automatically through a secure mechanism.

    4. (Optional) Configure Oozie to Run Spark S3 Jobs - Set to the path of the .jceks file in Oozie's workflow.xml file under the Spark Action's spark-opts section. This allows Spark to load AWS credentials from the .jceks file in HDFS.
      <action name="sparkS3job">
      You can use the Oozie notation ${wf:user()} in the path to let Oozie use different AWS credentials for each user. For example:
  • (Not Recommended) Specify the credentials at run time. For example: