About SumitraMenon

SumitraMenon · ‎12-10-2019

This video explains how to configure Spark2 to use HiveWarehouseConnector. Open the video on YouTube here To access Hive from Spark2 on HDP3, there are some requirements to meet and use the HiveWarehouseConnector. The configuration steps to use the HiveWarehouseConnector can be set at the Cluster level and/or Job level. This requires to collect base information from our Hive service to later on be configured on Spark2 via Ambari, or per application submission from a terminal providing the same configurations as arguments to the Spark2 client.

SumitraMenon · ‎12-10-2019

On HDP3, SparkSQL API will directly query Spark2 own catalog namespace. The Spark catalog is independent of the Hive catalog. Hence, a HiveWarehouseConnector was developed to allow Spark users to query Hive data through the HiveWarehouseSessionAPI. Hive tables on HDP3 are ACID by default, given that Spark2 does not operate on ACID tables yet. To guarantee data integrity, the HiveWarehouseConnector will process queries through the HiveServer2Interactive (LLAP) service. This is not the case for External tables. This video will explain how to access Hive from Spark2 on HDP3 along with some architectural changes and the support provided for particular use cases. Open the video on YouTube here

SumitraMenon · ‎12-10-2019

This video describes an easy to use Python script to generate data for Hive, based on an input table schema. This data generator for Hive solves the issue of loading data into tables with a lot of columns (such as more than 1500 columns). This automation script supports faster testing of queries and analyzing performance. To get the code, see the KB link (for customers only). Open the video on YouTube here

SumitraMenon · ‎12-10-2019

This Video Describes how Kafka ACLs work in HDP. This method is not supported in CDP7, please investigate Ranger Authorization for ACLs in CDP. Open the video on YouTube here Apache Kafka comes with an authorizer implementation that uses ZooKeeper to store all the ACLs. The ACLs have to be set because the access to resources is limited to super users when an authorizer is configured. By default, if a resource has no associated ACLs, then no one is allowed to access the resource, except super users. The following are the main ACL commands: Add ACLs: bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zkHost>:<zkPort> --add --allow-principal User:<username> --operation All --topic <topicName> --group=* In the above command, ACLs are added to allow a principal to have All operations available over the topic specified. The following are the available operations: Read Write Create Delete Alter Describe ClusterAction DescribeConfigs AlterConfigs IdempotentWrite All When using --group=*, it means that all groups are allowed to be created by this user when running a Kafka consumer. The following is the command to list ACLs: bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zkHost>:<zkPort> --list In the above command, the available ACLs are listed for the Kafka cluster using --list. More details about ACLs options available in the following references: Authorization and ACLs ACLs command line interface

SumitraMenon · ‎12-10-2019

Many a times, it is necessary for a engineer/administrator to manipulate the content of Ambari-Infra-Solr using the command line utilities. They might or might not have access to the GUI interface. This video helps to understand the basic manipulation of: Listing collections and checking cluster status of Solr cloud. Creating new collections. Deleting the existing collections. To check if ambari-infra-solr server instance is running on the node, run the following: # ps -elf | grep -i infra-solr # netstat -plant | grep -i 8886 If the cluster is Kerberized. Check for valid kerberos tickets: # klist -A Obtain a kerberos ticket, if not present: # kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab |sed -n "4p"|cut -d ' ' -f7) List SOLR collections: curl --negotiate -u : "http://$(hostname -f):8886/solr/admin/collections?action=list" Create a collection: # curl --negotiate -u : "http://$(hostname -f): 8886/solr/admin/collections?action=CREATE&name=<collection_name>&numShards=<number>" The following are the optional Values: &maxShardsPerNode=<number> &replicationFactor=<number> Delete a collection: # curl --negotiate -u : "http://$(hostname -f): 8886/solr/admin/collections?action=DELETE&name=collection" Check status of the Solr Cloud cluster: # curl --negotiate -u : "http://$(hostname -f): 8886/solr/admin/collections?action=clusterstatus&wt=json" | python -m json.tool INDEX Keys: *solr_host = Host where solr instance(s) is running. *collection = Name of collection. *shard = Name of shard. *action = CREATE ( to add a collection(s) ) *action = DELETE ( to delete a collection(s) ) *action = CLUSTERSTATUS ( to get the list of available collection(s) in the Solr cloud cluster )

SumitraMenon · ‎12-10-2019

This video describes how to upgrade Ambari 2.6.2.2 to Ambari 2.7.3. Open the video on YouTube here Apache Ambari 2.7.3 is the latest among Ambari 2.7.x releases. Ambari 2.7.0, which was the first release in the 2.7.x series introduced significant improvements from its predecessor - Ambari 2.6.2. This video will help users upgrade from Ambari 2.6.2.2 to Ambari 2.7.3. Procedure I. Prerequisites Take a backup of the Ambari configuration file: # mkdir /root/backups # cp /etc/ambari-server/conf/ambari.properties /root/backups Turn off Service Auto Restart: From Ambari UI: Admin > Service Auto Start. Set Auto Start Services to Disabled. Click Save. Run Service Checks on all Ambari services. On each of the Ambari services installed on the cluster, run Service Checks as the following: From Ambari UI: <Service_Name> > Service Actions > Run Service Check For example: HDFS > Service Actions > Run Service Check. Start and Stop all of the Ambari services from Ambari UI. II. Stop Services If SmartSense is deployed, stop it and turn on Maintenance Mode. From Ambari Web, browse to Services > SmartSense and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu. If Ambari Metrics is deployed, stop it and turn on Maintenance Mode. From Ambari Web, browse to Services > Log Search and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu. If Log Search is run in the cluster, stop the service. From Ambari Web, browse to Services > Log Search and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu. Stop Ambari server: # ambari-server stop Stop Ambari agents: # ambari-agent stop Backup Ambari database: # mysqldump -u ambari -p ambari > /root/backups/ambari-before-upgrade.sql III. Download Ambari 2.7.3 repository 1. Replace the old Ambari repository with the latest on on all hosts in the cluster # wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.7.3.0/ambari.repo -O /etc/yum.repos.d/ambari.repo 2. Upgrade Ambari server # yum clean all # yum upgrade ambari-server Note: If HDF components is deployed in the HDP setup, upgrade the HDF Management Pack before upgrading the Database Schema in step IV. For more details see HDF Upgrade the HDF Management Pack 3. Upgrade Ambari agents # yum clean all # yum upgrade ambari-agent IV. Upgrade Database Schema On the Ambari server host, upgrade the Ambari database schema: # ambari-server upgrade Start Ambari server: # ambari-server start Start Ambari agents: # ambari-agent start V. Verify Ambari version From the Ambari UI: Go to Admin > About:

SumitraMenon · ‎12-10-2019

This video describes a step by step process for getting an HDP 3 cluster up and running on Centos 7. The video follows the Hortonworks Documentation and Support Matrix recommendations. Public repositories were used for a minimal two node install on CentOS 7.5. Services installed on Ambari node: ambari-server Services installed on node1: SmartSense, Ambari Metrics. Open the video on YouTube here Get ready Clean yum cache yum clean all Rebuild cache yum makecache Install utilities yum install openssl openssh-clients curl unzip gzip tar wget Double-check free RAM memory in the system free -m Check limits configuration ulimit -n -u Set limits, temporarily ulimit -n 32768 ulimit -u 65536 Set limits, permanently vim /etc/security/limits.conf root - nofile 32768 root - nproc 65536 Generate RSA SSH key ssh-keygen Send public RSA key to node1 and configure it in the authorized keys file ssh-copy-id 10.200.82.41 Test passwordless connection ssh 10.200.82.41 Install NTP package yum install ntp -y Edit NTP conf file for setting ISO code, as shown in the first column of the Strum Time Servers http://support.ntp.org/bin/view/Servers/StratumOneTimeServers vim /etc/ntp.conf Start NTP service systemctl start ntpd Check if the service is running systemctl status ntpd Print the list of time servers the hosts are synchronizing with ntpq -p Check the time drift between the hosts and an NTP server ntpdate -q 0.centos.pool.ntp.org Set hostnames on the fly hostname ambari.local hostname node1.local Edit the etc hosts file for setting the IP-names mapping vim /etc/hosts 10.200.82.40 ambari.local ambari 10.200.82.41 node1.local node1 Edit the OS network file for setting the permanent host name vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=ambari.local NETWORKING=yes HOSTNAME=node1.local Run a new shell for getting the current host names to show in the promt bash Double-check the output of hostname with and without f. They should be the same. hostname hostname -f Disable firewalld while intalling systemctl disable firewalld Stop firewall service service firewalld stop Check if SELinux is currently in enforcing mode getenforce Set it to permisive (or disabled) setenforce 0 Check if it switched modes getenforce Download the public Apache Ambari repo file wget -nv http://public-repo1.hortonworks.com/ambari/centos7/2.x/updates/2.7.1.0/ambari.repo -O/etc/yum.repos.d/ambari.repo List currently configured repositories yum repolist Install Ambari Server Install ambari-server package yum install ambari-server Configure ambari-server ambari-server setup Start the service ambari-server start Deploy HDP cluster component Browse to the Ambari Server user interface (UI). Default username and password are both admin http://ambari.local:8080/ Take a look at the root user's private RSA file, the one generated before cat .ssh/id_rsa

SumitraMenon · ‎12-10-2019

From Ambari 2.6, for all MYSQL_SERVER components in a blueprint, the mysql-connector-java.jar needs to be manually installed and registered. This video describes how to install and register MySQL connector to replace the embedded database instance that is by default used by Ambari Server. Open YouTube video here For certain services, Cloudbreak allows registering an existing RDBMS instance as an external source for a database. After registering the RDBMS with Cloudbreak, it can be used for multiple clusters. However, as this configuration needs to be used by Ambari before its installation, MySQL Connector needs to be connected the remote MySQL database. To manually install and register MySQL connector, do the following: Preparing MySQL Database Server Install MySQL Server on CentOS Linux 7: # yum -y localinstall https://dev.mysql.com/get/mysql57-community-release-el7-8. noarch.rpm # yum -y install mysql-community-server # systemctl start mysqld.service Complete the MySQL initial setup. Depending on MySQL version, use user blank password for MySQL root or get the password from mysqld.log: # grep password /var/log/mysqld.log # mysql_secure_installation Create a user for Ambari, grant permissions and create the initial Database: # mysql -u root -p CREATE USER 'ambari'@'%' IDENTIFIED BY 'Hadoop1234!'; GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%'; CREATE USER 'ambari'@'localhost' IDENTIFIED BY 'Hadoop1234!'; GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'localhost'; FLUSH PRIVILEGES; CREATE DATABASE ambari01; Configure Cloudbreak to use MySQL External Database Create a pre-ambari-start recipe to install the mysql-connector-java.jar: #!/bin/bash # Provide the JDBC Connector JAR file. # During cluster creation, Cloudbreak uses /opts/jdbc-drivers directory for the JAR file yum -y localinstall https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm yum -y install mysql-connector-java* if [[ ! -d /opt/jdbc-drivers ]] then mkdir /opt/jdbc-drivers cp /usr/share/java/mysql-connector-java.jar /opt/jdbc-drivers/mysql-connector-java.jar fi Register the database configuration: Database: MySQL MySQL Server: MySQL_DB_IP/FQDN MySQL User: ambari MySQL Password: Hadoop1234! JDBC Connector JAR URL: Empty JDBC Connection jdbc:mysql://MySQL_DB_IP/FQDN:Port/ambari01

SumitraMenon · ‎12-10-2019

Sometimes, a node needs to be decommissioned or have undetermined downtime for repairs. If the node has a Resource Manager, move it to a new host using the Resource Manager Move Wizard from Ambari Web User Interface.The Resource Manager Move wizard describes the set of automated steps to be taken to move one Resource Manager to new host. Since YARN and Mapreduce2 will be restarted, a cluster maintenance windows must be planned and be prepared for cluster downtime. This video describes how to move a Resource Manager to a new host using the Resource Manager Move Wizard from Ambari Web User Interface. Open the video on YouTube here To move YARN Resource Manager to a new host with Ambari, do the following: In Ambari Web, browse to Services > YARN > Summary. Select Service Actions and choose Move ResourceManager. The Move ResourceManager wizard launches, describing a set of automated steps that must be followed to move one ResourceManager to a new host. Click Get Started. This wizard will provide a walk- through to move the ResourceManager. The following services will be restarted as part of the wizard: You should plan a cluster maintenance window and prepare for cluster downtime when moving ResourceManager. YARN MAPREDUCE2. Click Next Select the Target Host. Assign ResourceManager to new host. Click next. Review and confirm the host selections. Expand YARN if necessary, to review all the configuration changes proposed for YARN. Click Deploy to approve the changes and start automatically moving the Resource Manager to a new host. On Configure Components, click Complete when all the progress bars are completed. After reloading the Ambari Web reloads, there will be some alerts. Wait a few minutes until all the services restart. Restart any components using Ambari Web, if necessary. REFERENCE: http://docs.hortonworks.com (official product documentation) http://community.hortonworks.com (community forum)

SumitraMenon · ‎12-10-2019

If an active Resource Manager in a cluster fails, to ensure that another Resource Manager is available, the Resource Manager high availability (HA) should be enabled and configured. In HDP 2.2 or later environment, high availability (HA) can be configured for ResourceManager by using the Enable ResourceManager HA wizard. To ensure this, there must be at least three hosts in the cluster and Apache ZooKeeper servers should be running. The Enable ResourceManager high availability section from the documentation contains the steps mentioned in this video. Open the video on YouTube here Recommended links: Product documentation page Community Forum

Online	Offline
Last Visited	‎04-10-2024 03:58 AM

Member Since	‎02-07-2019 08:28 PM
Last Visited	‎04-10-2024 03:58 AM
Posts	1,792
Kudos received	1

Cloudera Community

Support Video: How to configure Spark2 to use Hive...

Support Video: How to access Hive from Spark2 on H...

Support Video: How to generate Hive Random Data ba...

Support Video: How does Kafka ACLs work?

Support Video: How to list/ create/ delete collect...

Support Video: How to upgrade Ambari 2.6.2.2 to Am...

Support Video: How to install HDP 3.0?

Support Video: Deploying an HDP cluster using Clou...

Support Video: How to move YARN Resource Manager t...

Support Video: How to enable YARN Resource Manager...