Support Questions
Find answers, ask questions, and share your expertise

Setup Keystore for AWS Keys

Rising Star

Can someone get me started with setting up and using a Keystore (jceks file) in HDFS for use with Spark jobs in coordination with Hadoop Credentials Provider? I would like to store all my Amazon Access and Secret Keys there for retrieval. I see that in CDH 5.9 there is support for this now, but I don't see any way that it can be configured using Cloudera Manager. If anyone can get me pointed in the right direction, I would truly appreciate it.

 

In addition, if you know further details about integrating with Amazon Security Token Service (also new in CDH 5.9), please let me know.

 

Thanks,

Ben

1 REPLY 1

New Contributor
 
You can use one of the following methods described below to set up AWS credentials.
  • Set up AWS Credentials Using the Hadoop Credential Provider - Cloudera recommends you use this method to set up AWS access because it provides system-wide AWS access to a single predefined bucket, without exposing the secret key in a configuration file or having to specify it at runtime.
    1. Create the Hadoop credential provider file with the necessary access and secret keys:
      hadoop credential create fs.s3a.access.key -provider jceks://hdfs/<path_to_hdfs_file> -value <aws_access_id>

      For example:

      hadoop credential create fs.s3a.access.key -provider jceks://hdfs/user/root/awskeyfile.jceks -value AKI***********************
    2. Add the AWS secret key to the .jceks credential file.
      hadoop credential create fs.s3a.secret.key -provider jceks://hdfs/<path_to_hdfs_file> -value <aws_secret_key>

      For example:

      hadoop credential create fs.s3a.secret.key -provider jceks://hdfs/user/root/awskeyfile.jceks -value +pla**************************************************
    3. AWS access for users can be set up in two ways. You can either provide a global credential provider file that will allow all Spark users to submit S3 jobs, or have each user submit their own credentials every time they submit a job.
      • For Per-User Access - Provide the path to your specific credential store on the command line when submitting a Spark job. This means you do not need to modify the global settings for core-site.xml. Each user submitting a job can provide their own credentials at runtime as follows:
        spark-submit --conf spark.hadoop.hadoop.security.credential.provider.path=PATH_TO_JCEKS_FILE ...
      • For System-Wide Access - Point to the Hadoop credential file created in the previous step using the Cloudera Manager Server:
        1. Login to the Cloudera Manager server.
        2. On the main page under Cluster, click on HDFS. Then click on Configuration. In the search box, enter core-site.
        3. Click on the + sign next to Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml. For Name, put spark.hadoop.security.credential.provider.path and for Value put jceks://hdfs/path_to_hdfs_file. For example, jceks://hdfs/user/root/awskeyfile.jceks.
        4. Click Save Changes and deploy the client configuration to all nodes of the cluster.

          After the services restart, you can use AWS filesystem with credentials supplied automatically through a secure mechanism.

    4. (Optional) Configure Oozie to Run Spark S3 Jobs - Set spark.hadoop.security.credential.provider.path to the path of the .jceks file in Oozie's workflow.xml file under the Spark Action's spark-opts section. This allows Spark to load AWS credentials from the .jceks file in HDFS.
      <action name="sparkS3job">
          <spark>
              ....
              <spark-opts>--conf spark.hadoop.hadoop.security.credential.provider.path=PATH_TO_JCEKS_FILE</spark-opts>
              ....
      </action>
      You can use the Oozie notation ${wf:user()} in the path to let Oozie use different AWS credentials for each user. For example:
      --conf spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs/user/${wf:user()}/aws.jceks
  • (Not Recommended) Specify the credentials at run time. For example:
 
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.