Member since
01-07-2019
217
Posts
135
Kudos Received
18
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3808 | 12-09-2021 09:57 PM |
06-02-2017
06:24 PM
7 Kudos
Overview
To control access, Azure uses Azure Active Directory (Azure AD), a multi-tenant cloud-based directory and identity management service. To learn more, refer to
https://docs.microsoft.com/en-us/azure/active-directory/active-directory-whatis.
In short, to configure authentication with ADLS using the client credential, you must register a new application with Active Directory service and then give your application access to your ADL account. After you've performed these steps, you can configure your core-site.xml.
Note for Cloudbreak users: When you create a cluster with Cloudbreak, you can configure authentication with ADLS on the "Add File System" page of the create cluster wizard and then you must perform an additional step as described in Cloudbreak documentation. If you do this, you do not need to perform the steps below. If you have already created a cluster with Cloudbreak but did not perform ADLS configuration on the "Add File System" page of the create cluster wizard, follow the steps below: Prerequisites
1. To use ADLS storage, you must have a subscription for Data Lake Storage.
2. To access ADLS data in HDP, you must have an HDP version that supports that. I am using HDP 2.6.1, which supports connecting to ADLS using the ADL connector. Step 1: Register an application
1. Log in to the Azure Portal at
https://portal.azure.com/.
2. Navigate to your
Active Directory and then select App Registrations:
3. Create a new web application by clicking on
+New application registration.
4. Specify an application name, type (Web app/API), and sign-on URLs.
Remember the application name: you will later add it to your ADLS account as an authorized user:
5. Once an application is created, navigate to the application configuration and find the Keys in the application's settings:
6. Create a key by entering key description, selecting a key duration, and then clicking
Save. Make sure to copy and save the key value. You won't be able to retrieve it after you leave the page.
7. Write down the properties that you will need to authenticate: Step 2: Assign permissions to your application 1.Log in to the Azure Portal. 2.If you don't have an ADL account, create one: 3.Navigate to your ADL account and then select Access Control (IAM): 4.Click on +Add to add to add role-based permissions. 5.Under Role select the "Owner". Under Select, select your application. This will grant the "Owner" role for this ADL account to your application. Note: If you are not able to assign the "Owner" role, you can set fine-grained RWX ACL permissions for your application, allowing it access to the files and folders of your ADLS account, as documented here. Note: If using a corporate Azure account, you may be unable to perform the role assignment step. In this case, contact your Azure admin to perform this step for you. Step 3: Configure core-site.xml
1.Add the following four properties to your core-site.xml.
While "fs.adl.oauth2.access.token.provider.type" must be set to “ClientCredential” you can obtain the remaining three parameters from step 7 above.
<property>
<name>fs.adl.oauth2.access.token.provider.type</name>
<value>ClientCredential</value></property>
<property>
<name>fs.adl.oauth2.client.id</name>
<value>APPLICATION-ID</value></property>
<property>
<name>fs.adl.oauth2.credential</name>
<value>KEY</value></property>
<property>
<name>fs.adl.oauth2.refresh.url</name>
<value>TOKEN-ENDPOINT</value>
</property>
2. (Optional) It's recommended that you protect your credentials with credential providers. For instructions, refer to https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/adls-protecting-credentials.html. Step 4: Validate access to ADLS
To make sure that the authentication works, try accessing data. To test access, SSH to any cluster node, switch to the hdfs user by using sudo su hdfs and then try accessing your data. The URL structure is:
adl://<data_lake_store_name>.azuredatalakestore.net/dir/file
For example, to access "testfile" located in a directory called "testdir", stored in a data lake store called "mytest", the URL is:
adl://mytest.azuredatalakestore.net/testdir/testfile
The following FileSystem shell commands demonstrate access to a data lake store named mytest:
hadoop fs -ls adl://mytest.azuredatalakestore.net/
hadoop fs -mkdir adl://mytest.azuredatalakestore.net/testDir
hadoop fs -put testFile adl://mytest.azuredatalakestore.net/testDir/testFile
hadoop fs -cat adl://mytest.azuredatalakestore.net/testDir/testFiletest
file content
Learn more
For more information about working with ADLS, refer to Getting Started with ADLS in Hortonworks documentation.
... View more
06-02-2017
04:25 PM
1 Kudo
We are excited to introduce the new Cloud Data Access guide for HDP 2.6.1. The goal of this guide is to provide information and steps required for configuring, using, securing, tuning performance, and troubleshooting access to the cloud storage services using HDP cloud storage connectors available for Amazon Web Services (Amazon S3) and Microsoft Azure (ADLS, WASB). To learn about the architecture of the cloud connectors, refer to Introducing the Cloud Storage Connectors. To get started with your chosen cloud storage service, refer to:
Getting Started with Amazon S3 Getting Started with ADLS Getting Started with WASB Once you have configured authentication with the chosen cloud storage service, you can start working with the data. To get started, refer to:
Accessing Cloud Data with Hive Accessing Cloud Data with Spark Copying Cloud Data with Hadoop If you have comments or suggestions, corrections or updates regarding our documentation, let us know on HCC. Help us continue to improve our documentation! Thanks! Hortonworks Technical Documentation Team
... View more
06-01-2017
09:42 PM
@jeff Can you answer this? By the way, you get a better visibility by posting a question as a separate thread rather than commenting below an article.
... View more
06-01-2017
05:37 PM
@Sameer Bhatnagar They are not currently included. Currently, we only include the following services.
... View more
05-31-2017
10:07 PM
Updated for HDCloud for AWS version 1.14.4. Check it out!
... View more
05-31-2017
09:24 PM
Updated for the latest release 1.14.4. No major changes though. Check it out!
... View more
05-31-2017
09:19 PM
Updated for the latest HDCloud version 1.14.4. No major changes, just updated screenshots and links. Check it out!
... View more
04-20-2017
06:42 PM
Hi @Namit Maheshwari Setting fs.defaultFS permanently to s3a is not recommended.
... View more
04-05-2017
05:25 PM
5 Kudos
HDCloud for AWS general availability version 1.14.1 is now available, including six new HDP 2.6 and Ambari 2.5 cluster configurations and new cloud controller features. If you are new to HDCloud, you can get started using this tutorial (updated for 1.14.1). Officail HDCloud for AWS documentation is available here. HDP 2.6 and Ambari 2.5 The following HDP 2.6 configurations are now available: For the list of all available HDP 2.5 and HDP 2.6 configurations, refer to Cluster Configurations documentation. Resource Tagging When creating a cluster, you can optionally add custom tags that will be displayed on the CloudFormation stack and on EC2 instances, allowing you to keep track of the resources that cloud controller crates on your behalf. For more information, refer to Resource Tagging documentation. Node Auto Repair The cloud controller monitors clusters by checking for Ambari Agent heartbeat on all cluster nodes. If the Ambari Agent heartbeat is lost on a node, a failure is reported for that node. Once the failure is reported, it is fixed automatically (if auto repair is enabled), or options are available for you to fix the failure manually (if auto repair is disabled). You can configure auto repair settings for each cluster when you create it. For more information, refer to Node Auto Repair documentation. Auto Scaling Auto Scaling provides the ability to increase or decrease the number of nodes in a cluster according to the auto scaling policies that you define. After you create an auto scaling policy, cloud controller will execute the policy when the conditions that you specified are met. You can create an auto scaling policy when creating a cluster or when the cluster is already running you can manage the auto scaling settings and policies. For more information, refer to Auto Scaling documentation. Protected Gateway HDCloud now configures a protected gateway on the cluster master node. This gateway is designed to provide access to various cluster resources from a single network port. Shared Druid Metastore (Technical Preview) When creating an HDP 2.6 cluster based on the BI configuration, you have an option to have a Druid metastore database created with the cluster, or you can use an external Druid metastore that is backed by Amazon RDS. Using an external Amazon RDS database for a Druid metastore allows you to preserve the Druid metastore metadata and reuse it between clusters. For more information, refer to Managing Shared Metastores documentation. The features are available via cloud controller UI or CLI.
... View more