Member since
02-07-2019
1792
Posts
1
Kudos Received
0
Solutions
04-02-2020
03:38 AM
1 Kudo
This is a short video tutorial to configure cross-realm trust between two secure (kerberized) clusters with different realm names. Cluster 1 (c1232) has the realm name SUPPORTLAB.CLOUDERA.COM and Cluster 2 (c4232) has the realm name COELAB.CLOUDERA.COM. This video explains the steps to set up a cross-realm trust in order to perform distcp operation.
Open the video on YouTube here
... View more
Labels:
12-10-2019
08:22 AM
This video describes how to use CA Signed Certificates for Ambari Metric System deployed in distributed mode with multiples Metrics Collectors.
Open YouTube video here
Ambari Metric System (AMS) HA
Ambari Metrics System is an Ambari-native pluggable and scalable system for collecting and querying Hadoop Metrics, that includes Grafana, a powerful dashboard builder that is fully open source with a wide community adoption. By default, Metrics Collector is the REST API component that receives metrics payload as JSON over HTTP from the Sinks and Monitors. The metrics are written into the HBase storage layer which is dedicated storage for metric data and managed as a part of AMS, separate from the cluster HBase. The HBase schema is defined using Phoenix and all the read write operations from AMS are Phoenix jdbc API calls. The Sink implementations are native to AMS and are placed in the classpath of the supported Hadoop ecosystem services by Ambari. The Monitors are lightweight python daemons for system counters that use psutil native libraries for data collection. AMS can scale horizontally by adding additional Collector nodes which effectively adds additional HBase Regionserver(s) to handle increased read/write load. Ambari stack advisor is utilized to advise on AMS configurations proportional to the number of Sinks and monitors and thereby the cluster size. For this article, the CA has provided with a couple of PKCS#12 bundle of certificates called amc01.p12 and amc02.p12. Since is the same CA for both certificates, from one of them, you will get the CA certificates (root + Intermediates). This configuration assumes the following locations:
/var/tmp/certificates/AMS. The path where will be copied the PKCS#12 bundles.
/var/tmp/certificates/AMS/TRUSTSTORE. The path where will be created the truststore for all nodes .
/var/tmp/certificates/AMS/KEYSTORE/{AMC01,AMC02}. The path to create the keystore for the collectors.
/usr/jdk64/jdk1.8.0_112. The path for the java version installed.
c3132-node1, c3132-node2, c3132-node3, c3132-node4. HDP Cluster Nodes.
c3132-node1. Ambari Server.
c3132-node2, c3132-node3. Cluster nodes configured as Ambari Metrics Collectors.
/labs/AMS/truststore.jks. The path for truststore in all nodes.
/labs/AMS/keystore.jks. The path for keystore in each of Ambari Metrics Collector.
SSL Setup Logical Steps Basically, for each metrics collector, add the PKCS#12 bundle identified by an alias with the Metrics Collector FQDN with PrivateKeyEntry, and the RootCA and Intermediate certificates in a Truststore identified by an alias with trustedCertEntry.
Every time Ambari starts the service, it will try to export the rootCA and intermediate certificates from the Truststore located in all nodes. First, it will try converting the Truststore from JKS format to PKCS12 format, then exporting all the CA certificates from the Truststore to its configuration directory creating the file called ca.pem. You could see the following messages from Ambari Operations Status Page.
Execute['ambari-sudo.sh /usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore
/labs/AMS/truststore.jks -destkeystore /tmp/tmp0_1xE1/truststore.p12
-srcalias c3132-node3.user.local -deststoretype PKCS12 -srcstorepass hadoop1234
-deststorepass hadoop1234'] {}
Execute['ambari-sudo.sh /usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore
/labs/AMS/truststore.jks -destkeystore /tmp/tmp0_1xE1/truststore.p12 -srcalias
c3132-node2.user.local -deststoretype PKCS12 -srcstorepass hadoop1234 -deststorepass hadoop1234'] {}
Execute['ambari-sudo.sh openssl pkcs12 -in /tmp/tmpI3YmtL/truststore.p12 -out
/etc/ambari-metrics-monitor/conf/ca.pem -cacerts -nokeys -passin pass:hadoop1234'] {}
Follow these steps to complete the previous setup. For this procedure, the node c3132-node2.user.local will hold the Active Ambari Metrics Collector.
Since you received a couple of certificates bundle from the same Certificate Authority, you will Extract CA Certificates from one of the PKCS#12 Bundle
cd /var/tmp/certificates/AMS && ls -l
openssl pkcs12 -in c3132-node2.user.local.p12 -out rootca.crt -cacerts -nokeys -passin
pass:hadoop1234
Create the truststore and add the CA Certificate.
/usr/jdk64/jdk1.8.0_112/bin/keytool -keystore TRUSTSTORE/truststore.jks -alias caroot
-import -file rootca.crt -storepass hadoop1234
/usr/jdk64/jdk1.8.0_112/bin/keytool -list -keystore TRUSTSTORE/truststore.jks
Add to the truststore the PrivateCertEntry for all the Ambari Metrics Collectors using the FQDN as an alias
/usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore c3132-node2.user.local.p12
-alias c3132-node2.user.local -destkeystore TRUSTSTORE/truststore.jks -srcstoretype pkcs12
-deststoretype jks
/usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore c3132-node3.user.local.p12
-alias c3132-node3.user.local -destkeystore TRUSTSTORE/truststore.jks -srcstoretype pkcs12
-deststoretype jks
/usr/jdk64/jdk1.8.0_112/bin/keytool -list -keystore TRUSTSTORE/truststore.jks
Create the keystore for the first Ambari Metrics Collector adding the rootca as a TrustedCertEntry and server as a PrivateKeyEntry
/usr/jdk64/jdk1.8.0_112/bin/keytool -keystore KEYSTORE/AMC01/keystore.jks -alias caroot
-import -file rootca.crt -storepass hadoop1234
/usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore
c3132-node2.user.local.p12 -alias c3132-node2.user.local
-destkeystore KEYSTORE/AMC01/keystore.jks -srcstoretype pkcs12 -deststoretype jks
Create the keystore for the second Ambari Metrics Collector adding the rootca as a TrustedCertEntry and server as a PrivateKeyEntry
/usr/jdk64/jdk1.8.0_112/bin/keytool -keystore KEYSTORE/AMC02/keystore.jks -alias caroot
-import -file rootca.crt -storepass hadoop1234
/usr/jdk64/jdk1.8.0_112/bin/keytool -importkeystore -srckeystore
c3132-node3.user.local.p12 -alias c3132-node3.user.local -destkeystore
KEYSTORE/AMC02/keystore.jks -srcstoretype pkcs12 -deststoretype jks
Copy the truststore to all nodes, including Ambari server and the keystore for each Ambari Metrics Collector
for i in c3132-node1 c3132-node2 c3132-node3 c3132-node4
do
ssh root@${i} "mkdir -p /labs/AMS"
scp /var/tmp/certificates/AMS/TRUSTSTORE/truststore.jks root@${i}:/labs/AMS/
if [[ ${i} == "c3132-node2" ]] ; then
scp /var/tmp/certificates/AMS/KEYSTORE/AMC01/keystore.jks root@${i}:/labs/AMS/
elif [[ ${i} == "c3132-node3" ]] ; then
scp /var/tmp/certificates/AMS/KEYSTORE/AMC02/keystore.jks root@${i}:/labs/AMS/
else
echo
fi
done
From Ambari, configure the SSL properties (SSL Server/Client) to reference the Keystore and Truststore.
AmbariUI / Services / Ambari Metrics / Configs /
ams-site
timeline.metrics.service.http.policy=HTTPS_ONLY
ams-ssl-server
ssl.server.keystore.keypassword=hadoop1234
ssl.server.keystore.location=/labs/AMS/keystore.jks
ssl.server.keystore.password=hadoop1234
ssl.server.keystore.type=jks
ssl.server.truststore.location=/labs/AMS/truststore.jks
ssl.server.truststore.password=hadoop1234
ssl.server.truststore.reload.interval=10000
ssl.server.truststore.type=jks
ams-ssl-client
ssl.client.truststore.location=/labs/AMS/truststore.jks
ssl.client.truststore.password=hadoop1234
ssl.client.truststore.type=jks
AmbariUI -> Restart All Required
Configure Ambari server to use https instead of http in all the requests to AMS Collector
ssh root@c3132-node1
echo "server.timeline.metrics.https.enabled=true" >> /etc/ambari-server/conf/ambari.properties
ambari-server setup-security
Using python /usr/bin/python
Security setup options...
===========================================================================
Choose one of the following options:
[1] Enable HTTPS for Ambari server.
[2] Encrypt passwords stored in ambari.properties file.
[3] Setup Ambari kerberos JAAS configuration.
[4] Setup truststore.
[5] Import certificate to truststore.
===========================================================================
Enter choice, (1-5): 4
Do you want to configure a truststore [y/n] (y)? y
TrustStore type [jks/jceks/pkcs12] (jks):
Path to TrustStore file :/labs/AMS/truststore.jks
Password for TrustStore:
Re-enter password:
Ambari Server 'setup-security' completed successfully.
ambari-server restart
From one of the Ambari Metrics Monitor validate the https comunnication.
ssh root@c3132-node4
tail -f /var/log/ambari-metrics-monitor/ambari-metrics-monitor.log
The following messages reflects HTTPS communication to the active Metrics Collector:
2018-12-12 02:27:11,835 [INFO] emitter.py:210 - Calculated collector shard based on hostname : c3132-node2.user.local
2018-12-12 02:27:11,835 [INFO] security.py:52 - SSL Connect being called..
connecting to https://c3132-node2.user.local:6188/
2018-12-12 02:27:11,855 [INFO] security.py:43 - SSL connection established.
... View more
Labels:
12-10-2019
08:21 AM
This video explains how to configure Spark2 to use HiveWarehouseConnector.
Open the video on YouTube here
To access Hive from Spark2 on HDP3, there are some requirements to meet and use the HiveWarehouseConnector. The configuration steps to use the HiveWarehouseConnector can be set at the Cluster level and/or Job level. This requires to collect base information from our Hive service to later on be configured on Spark2 via Ambari, or per application submission from a terminal providing the same configurations as arguments to the Spark2 client.
... View more
12-10-2019
08:19 AM
On HDP3, SparkSQL API will directly query Spark2 own catalog namespace. The Spark catalog is independent of the Hive catalog. Hence, a HiveWarehouseConnector was developed to allow Spark users to query Hive data through the HiveWarehouseSessionAPI. Hive tables on HDP3 are ACID by default, given that Spark2 does not operate on ACID tables yet. To guarantee data integrity, the HiveWarehouseConnector will process queries through the HiveServer2Interactive (LLAP) service. This is not the case for External tables.
This video will explain how to access Hive from Spark2 on HDP3 along with some architectural changes and the support provided for particular use cases.
Open the video on YouTube here
... View more
Labels:
12-10-2019
08:18 AM
This video describes an easy to use Python script to generate data for Hive, based on an input table schema. This data generator for Hive solves the issue of loading data into tables with a lot of columns (such as more than 1500 columns). This automation script supports faster testing of queries and analyzing performance. To get the code, see the KB link (for customers only).
Open the video on YouTube here
... View more
Labels:
12-10-2019
08:16 AM
This Video Describes how Kafka ACLs work in HDP. This method is not supported in CDP7, please investigate Ranger Authorization for ACLs in CDP.
Open the video on YouTube here
Apache Kafka comes with an authorizer implementation that uses ZooKeeper to store all the ACLs. The ACLs have to be set because the access to resources is limited to super users when an authorizer is configured. By default, if a resource has no associated ACLs, then no one is allowed to access the resource, except super users. The following are the main ACL commands: Add ACLs:
bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zkHost>:<zkPort> --add
--allow-principal User:<username> --operation All --topic <topicName> --group=*
In the above command, ACLs are added to allow a principal to have All operations available over the topic specified. The following are the available operations:
Read
Write
Create
Delete
Alter
Describe
ClusterAction
DescribeConfigs
AlterConfigs
IdempotentWrite
All
When using --group=*, it means that all groups are allowed to be created by this user when running a Kafka consumer. The following is the command to list ACLs:
bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zkHost>:<zkPort> --list
In the above command, the available ACLs are listed for the Kafka cluster using --list. More details about ACLs options available in the following references:
Authorization and ACLs
ACLs command line interface
... View more
Labels:
12-10-2019
08:14 AM
Many a times, it is necessary for a engineer/administrator to manipulate the content of Ambari-Infra-Solr using the command line utilities. They might or might not have access to the GUI interface.
This video helps to understand the basic manipulation of:
Listing collections and checking cluster status of Solr cloud.
Creating new collections.
Deleting the existing collections.
To check if ambari-infra-solr server instance is running on the node, run the following:
# ps -elf | grep -i infra-solr
# netstat -plant | grep -i 8886
If the cluster is Kerberized. Check for valid kerberos tickets:
# klist -A
Obtain a kerberos ticket, if not present:
# kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab
$(klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab
|sed -n "4p"|cut -d ' ' -f7)
List SOLR collections:
curl --negotiate -u : "http://$(hostname -f):8886/solr/admin/collections?action=list"
Create a collection:
# curl --negotiate -u : "http://$(hostname -f):
8886/solr/admin/collections?action=CREATE&name=<collection_name>&numShards=<number>"
The following are the optional Values:
&maxShardsPerNode=<number>
&replicationFactor=<number>
Delete a collection:
# curl --negotiate -u : "http://$(hostname -f):
8886/solr/admin/collections?action=DELETE&name=collection"
Check status of the Solr Cloud cluster:
# curl --negotiate -u : "http://$(hostname -f):
8886/solr/admin/collections?action=clusterstatus&wt=json" | python -m json.tool
INDEX Keys:
*solr_host = Host where solr instance(s) is running.
*collection = Name of collection.
*shard = Name of shard.
*action = CREATE ( to add a collection(s) )
*action = DELETE ( to delete a collection(s) )
*action = CLUSTERSTATUS ( to get the list of available collection(s) in the Solr cloud cluster )
... View more
Labels:
12-10-2019
08:12 AM
This video describes how to upgrade Ambari 2.6.2.2 to Ambari 2.7.3.
Open the video on YouTube here
Apache Ambari 2.7.3 is the latest among Ambari 2.7.x releases. Ambari 2.7.0, which was the first release in the 2.7.x series introduced significant improvements from its predecessor - Ambari 2.6.2. This video will help users upgrade from Ambari 2.6.2.2 to Ambari 2.7.3.
Procedure
I. Prerequisites
Take a backup of the Ambari configuration file:
# mkdir /root/backups
# cp /etc/ambari-server/conf/ambari.properties /root/backups
Turn off Service Auto Restart:
From Ambari UI: Admin > Service Auto Start. Set Auto Start Services to Disabled. Click Save.
Run Service Checks on all Ambari services.
On each of the Ambari services installed on the cluster, run Service Checks as the following:
From Ambari UI: <Service_Name> > Service Actions > Run Service Check
For example: HDFS > Service Actions > Run Service Check.
Start and Stop all of the Ambari services from Ambari UI.
II. Stop Services
If SmartSense is deployed, stop it and turn on Maintenance Mode. From Ambari Web, browse to Services > SmartSense and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu.
If Ambari Metrics is deployed, stop it and turn on Maintenance Mode. From Ambari Web, browse to Services > Log Search and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu.
If Log Search is run in the cluster, stop the service. From Ambari Web, browse to Services > Log Search and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu.
Stop Ambari server:
# ambari-server stop
Stop Ambari agents:
# ambari-agent stop
Backup Ambari database:
# mysqldump -u ambari -p ambari > /root/backups/ambari-before-upgrade.sql
III. Download Ambari 2.7.3 repository
1. Replace the old Ambari repository with the latest on on all hosts in the cluster
# wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.7.3.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
2. Upgrade Ambari server
# yum clean all
# yum upgrade ambari-server
Note: If HDF components is deployed in the HDP setup, upgrade the HDF Management Pack before upgrading the Database Schema in step IV. For more details see HDF Upgrade the HDF Management Pack 3. Upgrade Ambari agents
# yum clean all
# yum upgrade ambari-agent
IV. Upgrade Database Schema
On the Ambari server host, upgrade the Ambari database schema:
# ambari-server upgrade
Start Ambari server:
# ambari-server start
Start Ambari agents:
# ambari-agent start
V. Verify Ambari version
From the Ambari UI: Go to Admin > About:
... View more
Labels:
12-10-2019
08:10 AM
This video describes a step by step process for getting an HDP 3 cluster up and running on Centos 7. The video follows the Hortonworks Documentation and Support Matrix recommendations. Public repositories were used for a minimal two node install on CentOS 7.5.
Services installed on Ambari node: ambari-server
Services installed on node1: SmartSense, Ambari Metrics.
Open the video on YouTube here
Get ready
Clean yum cache yum clean all
Rebuild cache yum makecache
Install utilities yum install openssl openssh-clients curl unzip gzip tar wget
Double-check free RAM memory in the system free -m
Check limits configuration ulimit -n -u
Set limits, temporarily ulimit -n 32768 ulimit -u 65536
Set limits, permanently vim /etc/security/limits.conf root - nofile 32768 root - nproc 65536
Generate RSA SSH key ssh-keygen
Send public RSA key to node1 and configure it in the authorized keys file ssh-copy-id 10.200.82.41
Test passwordless connection ssh 10.200.82.41
Install NTP package yum install ntp -y
Edit NTP conf file for setting ISO code, as shown in the first column of the Strum Time Servers http://support.ntp.org/bin/view/Servers/StratumOneTimeServers vim /etc/ntp.conf
Start NTP service systemctl start ntpd
Check if the service is running systemctl status ntpd
Print the list of time servers the hosts are synchronizing with ntpq -p
Check the time drift between the hosts and an NTP server ntpdate -q 0.centos.pool.ntp.org
Set hostnames on the fly hostname ambari.local hostname node1.local
Edit the etc hosts file for setting the IP-names mapping vim /etc/hosts 10.200.82.40 ambari.local ambari 10.200.82.41 node1.local node1
Edit the OS network file for setting the permanent host name vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=ambari.local NETWORKING=yes HOSTNAME=node1.local
Run a new shell for getting the current host names to show in the promt bash
Double-check the output of hostname with and without f. They should be the same. hostname hostname -f
Disable firewalld while intalling systemctl disable firewalld
Stop firewall service service firewalld stop
Check if SELinux is currently in enforcing mode getenforce
Set it to permisive (or disabled) setenforce 0
Check if it switched modes getenforce
Download the public Apache Ambari repo file wget -nv http://public-repo1.hortonworks.com/ambari/centos7/2.x/updates/2.7.1.0/ambari.repo -O/etc/yum.repos.d/ambari.repo
List currently configured repositories yum repolist
Install Ambari Server
Install ambari-server package yum install ambari-server
Configure ambari-server ambari-server setup
Start the service ambari-server start
Deploy HDP cluster component
Browse to the Ambari Server user interface (UI). Default username and password are both admin http://ambari.local:8080/
Take a look at the root user's private RSA file, the one generated before cat .ssh/id_rsa
... View more
12-10-2019
08:03 AM
From Ambari 2.6, for all MYSQL_SERVER components in a blueprint, the mysql-connector-java.jar needs to be manually installed and registered. This video describes how to install and register MySQL connector to replace the embedded database instance that is by default used by Ambari Server.
Open YouTube video here
For certain services, Cloudbreak allows registering an existing RDBMS instance as an external source for a database. After registering the RDBMS with Cloudbreak, it can be used for multiple clusters. However, as this configuration needs to be used by Ambari before its installation, MySQL Connector needs to be connected the remote MySQL database.
To manually install and register MySQL connector, do the following:
Preparing MySQL Database Server
Install MySQL Server on CentOS Linux 7:
# yum -y localinstall https://dev.mysql.com/get/mysql57-community-release-el7-8.
noarch.rpm
# yum -y install mysql-community-server
# systemctl start mysqld.service
Complete the MySQL initial setup. Depending on MySQL version, use user blank password for MySQL root or get the password from mysqld.log:
# grep password /var/log/mysqld.log
# mysql_secure_installation
Create a user for Ambari, grant permissions and create the initial Database:
# mysql -u root -p
CREATE USER 'ambari'@'%' IDENTIFIED BY 'Hadoop1234!';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%';
CREATE USER 'ambari'@'localhost' IDENTIFIED BY 'Hadoop1234!';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'localhost';
FLUSH PRIVILEGES;
CREATE DATABASE ambari01;
Configure Cloudbreak to use MySQL External Database
Create a pre-ambari-start recipe to install the mysql-connector-java.jar:
#!/bin/bash # Provide the JDBC Connector JAR file.
# During cluster creation, Cloudbreak uses /opts/jdbc-drivers directory
for the JAR file yum -y localinstall
https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm yum -y
install mysql-connector-java* if [[ ! -d /opt/jdbc-drivers ]]
then mkdir /opt/jdbc-drivers cp /usr/share/java/mysql-connector-java.jar
/opt/jdbc-drivers/mysql-connector-java.jar fi
Register the database configuration: Database:
MySQL MySQL Server: MySQL_DB_IP/FQDN MySQL User:
ambari MySQL Password: Hadoop1234! JDBC Connector JAR URL:
Empty JDBC Connection jdbc:mysql://MySQL_DB_IP/FQDN:Port/ambari01
... View more