Support Questions
Find answers, ask questions, and share your expertise

Setup Keystore for AWS Keys

Rising Star

Can someone get me started with setting up and using a Keystore (jceks file) in HDFS for use with Spark jobs in coordination with Hadoop Credentials Provider? I would like to store all my Amazon Access and Secret Keys there for retrieval. I see that in CDH 5.9 there is support for this now, but I don't see any way that it can be configured using Cloudera Manager. If anyone can get me pointed in the right direction, I would truly appreciate it.


In addition, if you know further details about integrating with Amazon Security Token Service (also new in CDH 5.9), please let me know.





New Contributor
You can use one of the following methods described below to set up AWS credentials.
  • Set up AWS Credentials Using the Hadoop Credential Provider - Cloudera recommends you use this method to set up AWS access because it provides system-wide AWS access to a single predefined bucket, without exposing the secret key in a configuration file or having to specify it at runtime.
    1. Create the Hadoop credential provider file with the necessary access and secret keys:
      hadoop credential create fs.s3a.access.key -provider jceks://hdfs/<path_to_hdfs_file> -value <aws_access_id>

      For example:

      hadoop credential create fs.s3a.access.key -provider jceks://hdfs/user/root/awskeyfile.jceks -value AKI***********************
    2. Add the AWS secret key to the .jceks credential file.
      hadoop credential create fs.s3a.secret.key -provider jceks://hdfs/<path_to_hdfs_file> -value <aws_secret_key>

      For example:

      hadoop credential create fs.s3a.secret.key -provider jceks://hdfs/user/root/awskeyfile.jceks -value +pla**************************************************
    3. AWS access for users can be set up in two ways. You can either provide a global credential provider file that will allow all Spark users to submit S3 jobs, or have each user submit their own credentials every time they submit a job.
      • For Per-User Access - Provide the path to your specific credential store on the command line when submitting a Spark job. This means you do not need to modify the global settings for core-site.xml. Each user submitting a job can provide their own credentials at runtime as follows:
        spark-submit --conf ...
      • For System-Wide Access - Point to the Hadoop credential file created in the previous step using the Cloudera Manager Server:
        1. Login to the Cloudera Manager server.
        2. On the main page under Cluster, click on HDFS. Then click on Configuration. In the search box, enter core-site.
        3. Click on the + sign next to Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml. For Name, put and for Value put jceks://hdfs/path_to_hdfs_file. For example, jceks://hdfs/user/root/awskeyfile.jceks.
        4. Click Save Changes and deploy the client configuration to all nodes of the cluster.

          After the services restart, you can use AWS filesystem with credentials supplied automatically through a secure mechanism.

    4. (Optional) Configure Oozie to Run Spark S3 Jobs - Set to the path of the .jceks file in Oozie's workflow.xml file under the Spark Action's spark-opts section. This allows Spark to load AWS credentials from the .jceks file in HDFS.
      <action name="sparkS3job">
      You can use the Oozie notation ${wf:user()} in the path to let Oozie use different AWS credentials for each user. For example:
  • (Not Recommended) Specify the credentials at run time. For example:
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.