Member since
02-07-2019
1792
Posts
1
Kudos Received
0
Solutions
12-10-2019
08:21 AM
This video explains how to configure Spark2 to use HiveWarehouseConnector.
Open the video on YouTube here
To access Hive from Spark2 on HDP3, there are some requirements to meet and use the HiveWarehouseConnector. The configuration steps to use the HiveWarehouseConnector can be set at the Cluster level and/or Job level. This requires to collect base information from our Hive service to later on be configured on Spark2 via Ambari, or per application submission from a terminal providing the same configurations as arguments to the Spark2 client.
... View more
12-10-2019
08:19 AM
On HDP3, SparkSQL API will directly query Spark2 own catalog namespace. The Spark catalog is independent of the Hive catalog. Hence, a HiveWarehouseConnector was developed to allow Spark users to query Hive data through the HiveWarehouseSessionAPI. Hive tables on HDP3 are ACID by default, given that Spark2 does not operate on ACID tables yet. To guarantee data integrity, the HiveWarehouseConnector will process queries through the HiveServer2Interactive (LLAP) service. This is not the case for External tables.
This video will explain how to access Hive from Spark2 on HDP3 along with some architectural changes and the support provided for particular use cases.
Open the video on YouTube here
... View more
Labels:
12-10-2019
08:18 AM
This video describes an easy to use Python script to generate data for Hive, based on an input table schema. This data generator for Hive solves the issue of loading data into tables with a lot of columns (such as more than 1500 columns). This automation script supports faster testing of queries and analyzing performance. To get the code, see the KB link (for customers only).
Open the video on YouTube here
... View more
Labels:
12-10-2019
08:16 AM
This Video Describes how Kafka ACLs work in HDP. This method is not supported in CDP7, please investigate Ranger Authorization for ACLs in CDP.
Open the video on YouTube here
Apache Kafka comes with an authorizer implementation that uses ZooKeeper to store all the ACLs. The ACLs have to be set because the access to resources is limited to super users when an authorizer is configured. By default, if a resource has no associated ACLs, then no one is allowed to access the resource, except super users. The following are the main ACL commands: Add ACLs:
bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zkHost>:<zkPort> --add
--allow-principal User:<username> --operation All --topic <topicName> --group=*
In the above command, ACLs are added to allow a principal to have All operations available over the topic specified. The following are the available operations:
Read
Write
Create
Delete
Alter
Describe
ClusterAction
DescribeConfigs
AlterConfigs
IdempotentWrite
All
When using --group=*, it means that all groups are allowed to be created by this user when running a Kafka consumer. The following is the command to list ACLs:
bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zkHost>:<zkPort> --list
In the above command, the available ACLs are listed for the Kafka cluster using --list. More details about ACLs options available in the following references:
Authorization and ACLs
ACLs command line interface
... View more
Labels:
12-10-2019
08:14 AM
Many a times, it is necessary for a engineer/administrator to manipulate the content of Ambari-Infra-Solr using the command line utilities. They might or might not have access to the GUI interface.
This video helps to understand the basic manipulation of:
Listing collections and checking cluster status of Solr cloud.
Creating new collections.
Deleting the existing collections.
To check if ambari-infra-solr server instance is running on the node, run the following:
# ps -elf | grep -i infra-solr
# netstat -plant | grep -i 8886
If the cluster is Kerberized. Check for valid kerberos tickets:
# klist -A
Obtain a kerberos ticket, if not present:
# kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab
$(klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab
|sed -n "4p"|cut -d ' ' -f7)
List SOLR collections:
curl --negotiate -u : "http://$(hostname -f):8886/solr/admin/collections?action=list"
Create a collection:
# curl --negotiate -u : "http://$(hostname -f):
8886/solr/admin/collections?action=CREATE&name=<collection_name>&numShards=<number>"
The following are the optional Values:
&maxShardsPerNode=<number>
&replicationFactor=<number>
Delete a collection:
# curl --negotiate -u : "http://$(hostname -f):
8886/solr/admin/collections?action=DELETE&name=collection"
Check status of the Solr Cloud cluster:
# curl --negotiate -u : "http://$(hostname -f):
8886/solr/admin/collections?action=clusterstatus&wt=json" | python -m json.tool
INDEX Keys:
*solr_host = Host where solr instance(s) is running.
*collection = Name of collection.
*shard = Name of shard.
*action = CREATE ( to add a collection(s) )
*action = DELETE ( to delete a collection(s) )
*action = CLUSTERSTATUS ( to get the list of available collection(s) in the Solr cloud cluster )
... View more
Labels:
12-10-2019
08:12 AM
This video describes how to upgrade Ambari 2.6.2.2 to Ambari 2.7.3.
Open the video on YouTube here
Apache Ambari 2.7.3 is the latest among Ambari 2.7.x releases. Ambari 2.7.0, which was the first release in the 2.7.x series introduced significant improvements from its predecessor - Ambari 2.6.2. This video will help users upgrade from Ambari 2.6.2.2 to Ambari 2.7.3.
Procedure
I. Prerequisites
Take a backup of the Ambari configuration file:
# mkdir /root/backups
# cp /etc/ambari-server/conf/ambari.properties /root/backups
Turn off Service Auto Restart:
From Ambari UI: Admin > Service Auto Start. Set Auto Start Services to Disabled. Click Save.
Run Service Checks on all Ambari services.
On each of the Ambari services installed on the cluster, run Service Checks as the following:
From Ambari UI: <Service_Name> > Service Actions > Run Service Check
For example: HDFS > Service Actions > Run Service Check.
Start and Stop all of the Ambari services from Ambari UI.
II. Stop Services
If SmartSense is deployed, stop it and turn on Maintenance Mode. From Ambari Web, browse to Services > SmartSense and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu.
If Ambari Metrics is deployed, stop it and turn on Maintenance Mode. From Ambari Web, browse to Services > Log Search and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu.
If Log Search is run in the cluster, stop the service. From Ambari Web, browse to Services > Log Search and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu.
Stop Ambari server:
# ambari-server stop
Stop Ambari agents:
# ambari-agent stop
Backup Ambari database:
# mysqldump -u ambari -p ambari > /root/backups/ambari-before-upgrade.sql
III. Download Ambari 2.7.3 repository
1. Replace the old Ambari repository with the latest on on all hosts in the cluster
# wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.7.3.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
2. Upgrade Ambari server
# yum clean all
# yum upgrade ambari-server
Note: If HDF components is deployed in the HDP setup, upgrade the HDF Management Pack before upgrading the Database Schema in step IV. For more details see HDF Upgrade the HDF Management Pack 3. Upgrade Ambari agents
# yum clean all
# yum upgrade ambari-agent
IV. Upgrade Database Schema
On the Ambari server host, upgrade the Ambari database schema:
# ambari-server upgrade
Start Ambari server:
# ambari-server start
Start Ambari agents:
# ambari-agent start
V. Verify Ambari version
From the Ambari UI: Go to Admin > About:
... View more
Labels:
12-10-2019
08:10 AM
This video describes a step by step process for getting an HDP 3 cluster up and running on Centos 7. The video follows the Hortonworks Documentation and Support Matrix recommendations. Public repositories were used for a minimal two node install on CentOS 7.5.
Services installed on Ambari node: ambari-server
Services installed on node1: SmartSense, Ambari Metrics.
Open the video on YouTube here
Get ready
Clean yum cache yum clean all
Rebuild cache yum makecache
Install utilities yum install openssl openssh-clients curl unzip gzip tar wget
Double-check free RAM memory in the system free -m
Check limits configuration ulimit -n -u
Set limits, temporarily ulimit -n 32768 ulimit -u 65536
Set limits, permanently vim /etc/security/limits.conf root - nofile 32768 root - nproc 65536
Generate RSA SSH key ssh-keygen
Send public RSA key to node1 and configure it in the authorized keys file ssh-copy-id 10.200.82.41
Test passwordless connection ssh 10.200.82.41
Install NTP package yum install ntp -y
Edit NTP conf file for setting ISO code, as shown in the first column of the Strum Time Servers http://support.ntp.org/bin/view/Servers/StratumOneTimeServers vim /etc/ntp.conf
Start NTP service systemctl start ntpd
Check if the service is running systemctl status ntpd
Print the list of time servers the hosts are synchronizing with ntpq -p
Check the time drift between the hosts and an NTP server ntpdate -q 0.centos.pool.ntp.org
Set hostnames on the fly hostname ambari.local hostname node1.local
Edit the etc hosts file for setting the IP-names mapping vim /etc/hosts 10.200.82.40 ambari.local ambari 10.200.82.41 node1.local node1
Edit the OS network file for setting the permanent host name vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=ambari.local NETWORKING=yes HOSTNAME=node1.local
Run a new shell for getting the current host names to show in the promt bash
Double-check the output of hostname with and without f. They should be the same. hostname hostname -f
Disable firewalld while intalling systemctl disable firewalld
Stop firewall service service firewalld stop
Check if SELinux is currently in enforcing mode getenforce
Set it to permisive (or disabled) setenforce 0
Check if it switched modes getenforce
Download the public Apache Ambari repo file wget -nv http://public-repo1.hortonworks.com/ambari/centos7/2.x/updates/2.7.1.0/ambari.repo -O/etc/yum.repos.d/ambari.repo
List currently configured repositories yum repolist
Install Ambari Server
Install ambari-server package yum install ambari-server
Configure ambari-server ambari-server setup
Start the service ambari-server start
Deploy HDP cluster component
Browse to the Ambari Server user interface (UI). Default username and password are both admin http://ambari.local:8080/
Take a look at the root user's private RSA file, the one generated before cat .ssh/id_rsa
... View more
12-10-2019
08:03 AM
From Ambari 2.6, for all MYSQL_SERVER components in a blueprint, the mysql-connector-java.jar needs to be manually installed and registered. This video describes how to install and register MySQL connector to replace the embedded database instance that is by default used by Ambari Server.
Open YouTube video here
For certain services, Cloudbreak allows registering an existing RDBMS instance as an external source for a database. After registering the RDBMS with Cloudbreak, it can be used for multiple clusters. However, as this configuration needs to be used by Ambari before its installation, MySQL Connector needs to be connected the remote MySQL database.
To manually install and register MySQL connector, do the following:
Preparing MySQL Database Server
Install MySQL Server on CentOS Linux 7:
# yum -y localinstall https://dev.mysql.com/get/mysql57-community-release-el7-8.
noarch.rpm
# yum -y install mysql-community-server
# systemctl start mysqld.service
Complete the MySQL initial setup. Depending on MySQL version, use user blank password for MySQL root or get the password from mysqld.log:
# grep password /var/log/mysqld.log
# mysql_secure_installation
Create a user for Ambari, grant permissions and create the initial Database:
# mysql -u root -p
CREATE USER 'ambari'@'%' IDENTIFIED BY 'Hadoop1234!';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%';
CREATE USER 'ambari'@'localhost' IDENTIFIED BY 'Hadoop1234!';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'localhost';
FLUSH PRIVILEGES;
CREATE DATABASE ambari01;
Configure Cloudbreak to use MySQL External Database
Create a pre-ambari-start recipe to install the mysql-connector-java.jar:
#!/bin/bash # Provide the JDBC Connector JAR file.
# During cluster creation, Cloudbreak uses /opts/jdbc-drivers directory
for the JAR file yum -y localinstall
https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm yum -y
install mysql-connector-java* if [[ ! -d /opt/jdbc-drivers ]]
then mkdir /opt/jdbc-drivers cp /usr/share/java/mysql-connector-java.jar
/opt/jdbc-drivers/mysql-connector-java.jar fi
Register the database configuration: Database:
MySQL MySQL Server: MySQL_DB_IP/FQDN MySQL User:
ambari MySQL Password: Hadoop1234! JDBC Connector JAR URL:
Empty JDBC Connection jdbc:mysql://MySQL_DB_IP/FQDN:Port/ambari01
... View more
12-10-2019
08:01 AM
Sometimes, a node needs to be decommissioned or have undetermined downtime for repairs. If the node has a Resource Manager, move it to a new host using the Resource Manager Move Wizard from Ambari Web User Interface.The Resource Manager Move wizard describes the set of automated steps to be taken to move one Resource Manager to new host. Since YARN and Mapreduce2 will be restarted, a cluster maintenance windows must be planned and be prepared for cluster downtime.
This video describes how to move a Resource Manager to a new host using the Resource Manager Move Wizard from Ambari Web User Interface.
Open the video on YouTube here
To move YARN Resource Manager to a new host with Ambari, do the following:
In Ambari Web, browse to Services > YARN > Summary.
Select Service Actions and choose Move ResourceManager. The Move ResourceManager wizard launches, describing a set of automated steps that must be followed to move one ResourceManager to a new host.
Click Get Started. This wizard will provide a walk- through to move the ResourceManager.
The following services will be restarted as part of the wizard:
You should plan a cluster maintenance window and prepare for cluster downtime when moving ResourceManager.
YARN
MAPREDUCE2.
Click Next
Select the Target Host. Assign ResourceManager to new host. Click next.
Review and confirm the host selections.
Expand YARN if necessary, to review all the configuration changes proposed for YARN.
Click Deploy to approve the changes and start automatically moving the Resource Manager to a new host.
On Configure Components, click Complete when all the progress bars are completed.
After reloading the Ambari Web reloads, there will be some alerts. Wait a few minutes until all the services restart.
Restart any components using Ambari Web, if necessary.
REFERENCE:
http://docs.hortonworks.com (official product documentation) http://community.hortonworks.com (community forum)
... View more
Labels:
12-10-2019
08:00 AM
If an active Resource Manager in a cluster fails, to ensure that another Resource Manager is available, the Resource Manager high availability (HA) should be enabled and configured.
In HDP 2.2 or later environment, high availability (HA) can be configured for ResourceManager by using the Enable ResourceManager HA wizard. To ensure this, there must be at least three hosts in the cluster and Apache ZooKeeper servers should be running.
The Enable ResourceManager high availability section from the documentation contains the steps mentioned in this video.
Open the video on YouTube here Recommended links:
Product documentation page
Community Forum
... View more
Labels: