Member since
02-07-2019
1792
Posts
1
Kudos Received
0
Solutions
09-27-2020
11:14 PM
In this video, we'll review how to access data in S3 from the command line of a Data Hub cluster host using IDBroker. Some components in CDP work out of the box with IDBroker. However, most command-line tools like the Hadoop file system commands require a couple of additional steps to access data in S3. We'll demonstrate retrieving a keytab file for a workload user and using it to kinit on the Data Hub cluster host, enabling data access via IDBroker.
Open the video on YouTube here
Many command-line tools in CDP Public Cloud Data Hub clusters require a Kerberos ticket granting ticket (TGT) for a workload user in order to obtain a short-term access token for S3 or ADLS Gen 2 via IDBroker.
This video demonstrates the following steps:
Granting a data access role to a workload user
Retrieving a keytab file for the workload user
Copying the keytab file to a host in the data hub cluster
Using the keytab file to kinit
Confirming the TGT using klist
Accessing data in S3 via IDBroker
It mentions, but does not demonstrate, retrieving a keytab file via the cdp command-line tool. Instructions for doing so are available in CDP documentation.
... View more
Labels:
08-20-2020
01:47 AM
This video covers Livy's Feature, Operational Flow and Basic Demo.
Open the video on YouTube here
... View more
07-30-2020
01:33 AM
This video covers the Default Zeppelin UI option and basic navigation demo.
References:
Cloudera Product document page: https://docs.cloudera.com/runtime/7.1...
Cloudera tutorials: https://www.cloudera.com/tutorials/ge...
Visit Apache Zeppelin Website: http://zeppelin.apache.org
Cloudera Community: https://community.cloudera.com/
... View more
07-30-2020
01:30 AM
1 Kudo
This video covers zeppelin’s backend operations and overview on impersonation concepts in Zeppelin.
References:
Cloudera Product document page: https://docs.cloudera.com/runtime/7.1...
Cloudera tutorials: https://www.cloudera.com/tutorials/ge...
Visit Apache Zeppelin Website: http://zeppelin.apache.org
Cloudera Community: https://community.cloudera.com/
... View more
07-30-2020
01:26 AM
This video covers High level overview of Zeppelin’s architecture and Operational Flow.
References:
Cloudera Product document page: https://docs.cloudera.com/runtime/7.1...
Cloudera tutorials: https://www.cloudera.com/tutorials/ge...
Visit Apache Zeppelin Website: http://zeppelin.apache.org
Cloudera Community: https://community.cloudera.com/
... View more
05-22-2020
09:45 AM
We have created a new Support Video based on this topic:
How to mask Hive columns using Atlas tags and Ranger?
... View more
05-22-2020
09:20 AM
1 Kudo
Masking of Hive columns could be achieved using Hive resource-based policies and masking policies for databases, tables, and columns. However, dynamic masking could be achieved using Atlas tags or collections (from HDP-3.x), which empowers users to regulate the visibility of sensitive data by leveraging Atlas tag-based policies in Ranger.
Prerequisites of this tutorial include a healthy HDP cluster with existing tables/databases in Hive, Atlas configuration with Hive and Ranger, and Audit to Solr enabled for Ranger.
Open the video on YouTube here
This video is based on the original article How to Mask Columns in Hive with Atlas and Ranger.
Other references:Providing Authorization with Apache Ranger
... View more
Labels:
04-21-2020
04:32 AM
This video describes how to register an HDP cluster in DataPlane:
Open the video on YouTube here
DataPlane is a portfolio of data solutions that supports the management in discovery of data (whether at-rest or in-motion) and enable an enterprise hybrid data strategy (from data center to the cloud).
DataPlane is composed of a core platform (“DP Platform” or “Platform”) and an extensible set of apps (“DP Apps”) that are installed on the platform. Depending on the app which you plan to use, you may be required to install an agent into a cluster to support that app, as well as meet other cluster requirements.
The following are documents for reference:
Configure Knox Gateway for DataPlane
Configure Knox SSO for DataPlane
... View more
04-02-2020
03:38 AM
1 Kudo
This is a short video tutorial to configure cross-realm trust between two secure (kerberized) clusters with different realm names. Cluster 1 (c1232) has the realm name SUPPORTLAB.CLOUDERA.COM and Cluster 2 (c4232) has the realm name COELAB.CLOUDERA.COM. This video explains the steps to set up a cross-realm trust in order to perform distcp operation.
Open the video on YouTube here
... View more
Labels:
12-10-2019
08:22 AM
This video describes how to use CA Signed Certificates for Ambari Metric System deployed in distributed mode with multiples Metrics Collectors.
Open YouTube video here
Ambari Metric System (AMS) HA
Ambari Metrics System is an Ambari-native pluggable and scalable system for collecting and querying Hadoop Metrics, that includes Grafana, a powerful dashboard builder that is fully open source with a wide community adoption. By default, Metrics Collector is the REST API component that receives metrics payload as JSON over HTTP from the Sinks and Monitors. The metrics are written into the HBase storage layer which is dedicated storage for metric data and managed as a part of AMS, separate from the cluster HBase. The HBase schema is defined using Phoenix and all the read write operations from AMS are Phoenix jdbc API calls. The Sink implementations are native to AMS and are placed in the classpath of the supported Hadoop ecosystem services by Ambari. The Monitors are lightweight python daemons for system counters that use psutil native libraries for data collection. AMS can scale horizontally by adding additional Collector nodes which effectively adds additional HBase Regionserver(s) to handle increased read/write load. Ambari stack advisor is utilized to advise on AMS configurations proportional to the number of Sinks and monitors and thereby the cluster size. For this article, the CA has provided with a couple of PKCS#12 bundle of certificates called amc01.p12 and amc02.p12. Since is the same CA for both certificates, from one of them, you will get the CA certificates (root + Intermediates). This configuration assumes the following locations:
/var/tmp/certificates/AMS. The path where will be copied the PKCS#12 bundles.
/var/tmp/certificates/AMS/TRUSTSTORE. The path where will be created the truststore for all nodes .
/var/tmp/certificates/AMS/KEYSTORE/{AMC01,AMC02}. The path to create the keystore for the collectors.
/usr/jdk64/jdk1.8.0_112. The path for the java version installed.
c3132-node1, c3132-node2, c3132-node3, c3132-node4. HDP Cluster Nodes.
c3132-node1. Ambari Server.
c3132-node2, c3132-node3. Cluster nodes configured as Ambari Metrics Collectors.
/labs/AMS/truststore.jks. The path for truststore in all nodes.
/labs/AMS/keystore.jks. The path for keystore in each of Ambari Metrics Collector.
SSL Setup Logical Steps Basically, for each metrics collector, add the PKCS#12 bundle identified by an alias with the Metrics Collector FQDN with PrivateKeyEntry, and the RootCA and Intermediate certificates in a Truststore identified by an alias with trustedCertEntry.
Every time Ambari starts the service, it will try to export the rootCA and intermediate certificates from the Truststore located in all nodes. First, it will try converting the Truststore from JKS format to PKCS12 format, then exporting all the CA certificates from the Truststore to its configuration directory creating the file called ca.pem. You could see the following messages from Ambari Operations Status Page.
Execute['ambari-sudo.sh /usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore
/labs/AMS/truststore.jks -destkeystore /tmp/tmp0_1xE1/truststore.p12
-srcalias c3132-node3.user.local -deststoretype PKCS12 -srcstorepass hadoop1234
-deststorepass hadoop1234'] {}
Execute['ambari-sudo.sh /usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore
/labs/AMS/truststore.jks -destkeystore /tmp/tmp0_1xE1/truststore.p12 -srcalias
c3132-node2.user.local -deststoretype PKCS12 -srcstorepass hadoop1234 -deststorepass hadoop1234'] {}
Execute['ambari-sudo.sh openssl pkcs12 -in /tmp/tmpI3YmtL/truststore.p12 -out
/etc/ambari-metrics-monitor/conf/ca.pem -cacerts -nokeys -passin pass:hadoop1234'] {}
Follow these steps to complete the previous setup. For this procedure, the node c3132-node2.user.local will hold the Active Ambari Metrics Collector.
Since you received a couple of certificates bundle from the same Certificate Authority, you will Extract CA Certificates from one of the PKCS#12 Bundle
cd /var/tmp/certificates/AMS && ls -l
openssl pkcs12 -in c3132-node2.user.local.p12 -out rootca.crt -cacerts -nokeys -passin
pass:hadoop1234
Create the truststore and add the CA Certificate.
/usr/jdk64/jdk1.8.0_112/bin/keytool -keystore TRUSTSTORE/truststore.jks -alias caroot
-import -file rootca.crt -storepass hadoop1234
/usr/jdk64/jdk1.8.0_112/bin/keytool -list -keystore TRUSTSTORE/truststore.jks
Add to the truststore the PrivateCertEntry for all the Ambari Metrics Collectors using the FQDN as an alias
/usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore c3132-node2.user.local.p12
-alias c3132-node2.user.local -destkeystore TRUSTSTORE/truststore.jks -srcstoretype pkcs12
-deststoretype jks
/usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore c3132-node3.user.local.p12
-alias c3132-node3.user.local -destkeystore TRUSTSTORE/truststore.jks -srcstoretype pkcs12
-deststoretype jks
/usr/jdk64/jdk1.8.0_112/bin/keytool -list -keystore TRUSTSTORE/truststore.jks
Create the keystore for the first Ambari Metrics Collector adding the rootca as a TrustedCertEntry and server as a PrivateKeyEntry
/usr/jdk64/jdk1.8.0_112/bin/keytool -keystore KEYSTORE/AMC01/keystore.jks -alias caroot
-import -file rootca.crt -storepass hadoop1234
/usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore
c3132-node2.user.local.p12 -alias c3132-node2.user.local
-destkeystore KEYSTORE/AMC01/keystore.jks -srcstoretype pkcs12 -deststoretype jks
Create the keystore for the second Ambari Metrics Collector adding the rootca as a TrustedCertEntry and server as a PrivateKeyEntry
/usr/jdk64/jdk1.8.0_112/bin/keytool -keystore KEYSTORE/AMC02/keystore.jks -alias caroot
-import -file rootca.crt -storepass hadoop1234
/usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore
c3132-node3.user.local.p12 -alias c3132-node3.user.local -destkeystore
KEYSTORE/AMC02/keystore.jks -srcstoretype pkcs12 -deststoretype jks
Copy the truststore to all nodes, including Ambari server and the keystore for each Ambari Metrics Collector
for i in c3132-node1 c3132-node2 c3132-node3 c3132-node4
do
ssh root@${i} "mkdir -p /labs/AMS"
scp /var/tmp/certificates/AMS/TRUSTSTORE/truststore.jks root@${i}:/labs/AMS/
if [[ ${i} == "c3132-node2" ]] ; then
scp /var/tmp/certificates/AMS/KEYSTORE/AMC01/keystore.jks root@${i}:/labs/AMS/
elif [[ ${i} == "c3132-node3" ]] ; then
scp /var/tmp/certificates/AMS/KEYSTORE/AMC02/keystore.jks root@${i}:/labs/AMS/
else
echo
fi
done
From Ambari, configure the SSL properties (SSL Server/Client) to reference the Keystore and Truststore.
AmbariUI / Services / Ambari Metrics / Configs /
ams-site
timeline.metrics.service.http.policy=HTTPS_ONLY
ams-ssl-server
ssl.server.keystore.keypassword=hadoop1234
ssl.server.keystore.location=/labs/AMS/keystore.jks
ssl.server.keystore.password=hadoop1234
ssl.server.keystore.type=jks
ssl.server.truststore.location=/labs/AMS/truststore.jks
ssl.server.truststore.password=hadoop1234
ssl.server.truststore.reload.interval=10000
ssl.server.truststore.type=jks
ams-ssl-client
ssl.client.truststore.location=/labs/AMS/truststore.jks
ssl.client.truststore.password=hadoop1234
ssl.client.truststore.type=jks
AmbariUI -> Restart All Required
Configure Ambari server to use https instead of http in all the requests to AMS Collector
ssh root@c3132-node1
echo "server.timeline.metrics.https.enabled=true" >> /etc/ambari-server/conf/ambari.properties
ambari-server setup-security
Using python /usr/bin/python
Security setup options...
===========================================================================
Choose one of the following options:
[1] Enable HTTPS for Ambari server.
[2] Encrypt passwords stored in ambari.properties file.
[3] Setup Ambari kerberos JAAS configuration.
[4] Setup truststore.
[5] Import certificate to truststore.
===========================================================================
Enter choice, (1-5): 4
Do you want to configure a truststore [y/n] (y)? y
TrustStore type [jks/jceks/pkcs12] (jks):
Path to TrustStore file :/labs/AMS/truststore.jks
Password for TrustStore:
Re-enter password:
Ambari Server 'setup-security' completed successfully.
ambari-server restart
From one of the Ambari Metrics Monitor validate the https comunnication.
ssh root@c3132-node4
tail -f /var/log/ambari-metrics-monitor/ambari-metrics-monitor.log
The following messages reflects HTTPS communication to the active Metrics Collector:
2018-12-12 02:27:11,835 [INFO] emitter.py:210 - Calculated collector shard based on hostname : c3132-node2.user.local
2018-12-12 02:27:11,835 [INFO] security.py:52 - SSL Connect being called..
connecting to https://c3132-node2.user.local:6188/
2018-12-12 02:27:11,855 [INFO] security.py:43 - SSL connection established.
... View more
Labels: