Member since
01-07-2019
217
Posts
135
Kudos Received
18
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3958 | 12-09-2021 09:57 PM |
06-03-2017
08:14 PM
Pretty informative and useful. Thanks @Dominika Bialek for writing this. Keep it up !!
... View more
06-02-2017
06:24 PM
7 Kudos
Overview
To control access, Azure uses Azure Active Directory (Azure AD), a multi-tenant cloud-based directory and identity management service. To learn more, refer to
https://docs.microsoft.com/en-us/azure/active-directory/active-directory-whatis.
In short, to configure authentication with ADLS using the client credential, you must register a new application with Active Directory service and then give your application access to your ADL account. After you've performed these steps, you can configure your core-site.xml.
Note for Cloudbreak users: When you create a cluster with Cloudbreak, you can configure authentication with ADLS on the "Add File System" page of the create cluster wizard and then you must perform an additional step as described in Cloudbreak documentation. If you do this, you do not need to perform the steps below. If you have already created a cluster with Cloudbreak but did not perform ADLS configuration on the "Add File System" page of the create cluster wizard, follow the steps below: Prerequisites
1. To use ADLS storage, you must have a subscription for Data Lake Storage.
2. To access ADLS data in HDP, you must have an HDP version that supports that. I am using HDP 2.6.1, which supports connecting to ADLS using the ADL connector. Step 1: Register an application
1. Log in to the Azure Portal at
https://portal.azure.com/.
2. Navigate to your
Active Directory and then select App Registrations:
3. Create a new web application by clicking on
+New application registration.
4. Specify an application name, type (Web app/API), and sign-on URLs.
Remember the application name: you will later add it to your ADLS account as an authorized user:
5. Once an application is created, navigate to the application configuration and find the Keys in the application's settings:
6. Create a key by entering key description, selecting a key duration, and then clicking
Save. Make sure to copy and save the key value. You won't be able to retrieve it after you leave the page.
7. Write down the properties that you will need to authenticate: Step 2: Assign permissions to your application 1.Log in to the Azure Portal. 2.If you don't have an ADL account, create one: 3.Navigate to your ADL account and then select Access Control (IAM): 4.Click on +Add to add to add role-based permissions. 5.Under Role select the "Owner". Under Select, select your application. This will grant the "Owner" role for this ADL account to your application. Note: If you are not able to assign the "Owner" role, you can set fine-grained RWX ACL permissions for your application, allowing it access to the files and folders of your ADLS account, as documented here. Note: If using a corporate Azure account, you may be unable to perform the role assignment step. In this case, contact your Azure admin to perform this step for you. Step 3: Configure core-site.xml
1.Add the following four properties to your core-site.xml.
While "fs.adl.oauth2.access.token.provider.type" must be set to “ClientCredential” you can obtain the remaining three parameters from step 7 above.
<property>
<name>fs.adl.oauth2.access.token.provider.type</name>
<value>ClientCredential</value></property>
<property>
<name>fs.adl.oauth2.client.id</name>
<value>APPLICATION-ID</value></property>
<property>
<name>fs.adl.oauth2.credential</name>
<value>KEY</value></property>
<property>
<name>fs.adl.oauth2.refresh.url</name>
<value>TOKEN-ENDPOINT</value>
</property>
2. (Optional) It's recommended that you protect your credentials with credential providers. For instructions, refer to https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/adls-protecting-credentials.html. Step 4: Validate access to ADLS
To make sure that the authentication works, try accessing data. To test access, SSH to any cluster node, switch to the hdfs user by using sudo su hdfs and then try accessing your data. The URL structure is:
adl://<data_lake_store_name>.azuredatalakestore.net/dir/file
For example, to access "testfile" located in a directory called "testdir", stored in a data lake store called "mytest", the URL is:
adl://mytest.azuredatalakestore.net/testdir/testfile
The following FileSystem shell commands demonstrate access to a data lake store named mytest:
hadoop fs -ls adl://mytest.azuredatalakestore.net/
hadoop fs -mkdir adl://mytest.azuredatalakestore.net/testDir
hadoop fs -put testFile adl://mytest.azuredatalakestore.net/testDir/testFile
hadoop fs -cat adl://mytest.azuredatalakestore.net/testDir/testFiletest
file content
Learn more
For more information about working with ADLS, refer to Getting Started with ADLS in Hortonworks documentation.
... View more
06-02-2017
04:25 PM
1 Kudo
We are excited to introduce the new Cloud Data Access guide for HDP 2.6.1. The goal of this guide is to provide information and steps required for configuring, using, securing, tuning performance, and troubleshooting access to the cloud storage services using HDP cloud storage connectors available for Amazon Web Services (Amazon S3) and Microsoft Azure (ADLS, WASB). To learn about the architecture of the cloud connectors, refer to Introducing the Cloud Storage Connectors. To get started with your chosen cloud storage service, refer to:
Getting Started with Amazon S3 Getting Started with ADLS Getting Started with WASB Once you have configured authentication with the chosen cloud storage service, you can start working with the data. To get started, refer to:
Accessing Cloud Data with Hive Accessing Cloud Data with Spark Copying Cloud Data with Hadoop If you have comments or suggestions, corrections or updates regarding our documentation, let us know on HCC. Help us continue to improve our documentation! Thanks! Hortonworks Technical Documentation Team
... View more
04-05-2017
05:25 PM
5 Kudos
HDCloud for AWS general availability version 1.14.1 is now available, including six new HDP 2.6 and Ambari 2.5 cluster configurations and new cloud controller features. If you are new to HDCloud, you can get started using this tutorial (updated for 1.14.1). Officail HDCloud for AWS documentation is available here. HDP 2.6 and Ambari 2.5 The following HDP 2.6 configurations are now available: For the list of all available HDP 2.5 and HDP 2.6 configurations, refer to Cluster Configurations documentation. Resource Tagging When creating a cluster, you can optionally add custom tags that will be displayed on the CloudFormation stack and on EC2 instances, allowing you to keep track of the resources that cloud controller crates on your behalf. For more information, refer to Resource Tagging documentation. Node Auto Repair The cloud controller monitors clusters by checking for Ambari Agent heartbeat on all cluster nodes. If the Ambari Agent heartbeat is lost on a node, a failure is reported for that node. Once the failure is reported, it is fixed automatically (if auto repair is enabled), or options are available for you to fix the failure manually (if auto repair is disabled). You can configure auto repair settings for each cluster when you create it. For more information, refer to Node Auto Repair documentation. Auto Scaling Auto Scaling provides the ability to increase or decrease the number of nodes in a cluster according to the auto scaling policies that you define. After you create an auto scaling policy, cloud controller will execute the policy when the conditions that you specified are met. You can create an auto scaling policy when creating a cluster or when the cluster is already running you can manage the auto scaling settings and policies. For more information, refer to Auto Scaling documentation. Protected Gateway HDCloud now configures a protected gateway on the cluster master node. This gateway is designed to provide access to various cluster resources from a single network port. Shared Druid Metastore (Technical Preview) When creating an HDP 2.6 cluster based on the BI configuration, you have an option to have a Druid metastore database created with the cluster, or you can use an external Druid metastore that is backed by Amazon RDS. Using an external Amazon RDS database for a Druid metastore allows you to preserve the Druid metastore metadata and reuse it between clusters. For more information, refer to Managing Shared Metastores documentation. The features are available via cloud controller UI or CLI.
... View more
01-20-2017
08:01 PM
2 Kudos
We just updated Hortonworks Data Cloud for AWS to Technical Preview #1.12. The release is packed with goodies such as:
Support for deploying compute nodes with spot pricing. Support for executing node recipes - custom scripts that can be run pre- or post- cluster deployment for customizing the cluster and installing additional software. Support for HDP 2.6 (Technical Preview) that can launch two new cluster configurations for Spark 2.1 and Druid. To create an HDP 2.6 cluster, launch the cloud controller and when creating a cluster choose HDP Version: HDP 2.6 (Technical Preview) and then choose one of the available cluster types: For more details, refer to the Release Notes: http://hortonworks.github.io/hdp-aws/releasenotes/. To get started with HDCloud for AWS, visit http://hortonworks.github.io/hdp-aws/. To get started with Spark 2.1, see Vinay's blog at http://hortonworks.com/blog/try-apache-spark-2-1-zeppelin-hortonworks-data-cloud/. Have fun!
... View more
01-10-2017
05:35 PM
2 Kudos
After creating a cluster on the HDCLoud for AWS, you may notice that certain ports are not opened by default, so you may need to manually open these ports by editing the inbound access on the security group. In this tutorial, I will show you how to open YARN Resource Manager UI (8088) and Hive UI (10502) ports by manually editing the inbound access on the master node security group. Let’s get started! 1. On AWS, from the Services menu, select EC2 to navigate to the EC2 console: 2. In the left pane, in the INSTANCES section, click on Instances. Note: If you can’t see your instances, check the top right corner to make sure that you are in the correct region. 3. Identify the instance corresponding to your master node and. The name of the instance should be <your-cluster-name>-1-master. Next, select that instance. This will allow you to see the Description tab, which includes the link to the security group configuration: 4. Click on the security group URL to open the Security Group section. 5. Select the Inbound tab: 6. Check if 8088 and 10502 are found in the Port Range column. If not, add them by clicking the Edit button, then Add Rule, and add a new Custom TCP Rule for port 8088 with source “0.0.0.0/0”. Next, do the same for port 10502. Save changes by hitting the Save button.
... View more
Labels:
06-15-2017
07:27 PM
Updated for the latest HDCloud version 1.16. No major changes, just updated screenshots and links. Check it out!
... View more
04-03-2017
06:56 PM
Updated for HDCloud 1.14.1. Check it out!
... View more
05-31-2017
09:24 PM
Updated for the latest release 1.14.4. No major changes though. Check it out!
... View more
06-15-2017
07:20 PM
Updated for the latest HDCloud version 1.16. Check it out!
... View more
- « Previous
- Next »