Member since 
    
	
		
		
		02-07-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1792
            
            
                Posts
            
        
                1
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		12-10-2019
	
		
		08:21 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 This video explains how to configure Spark2 to use HiveWarehouseConnector. 
   
  
 Open the video on YouTube here 
   
 To access Hive from Spark2 on HDP3, there are some requirements to meet and use the HiveWarehouseConnector. The configuration steps to use the HiveWarehouseConnector can be set at the Cluster level and/or Job level. This requires to collect base information from our Hive service to later on be configured on Spark2 via Ambari, or per application submission from a terminal providing the same configurations as arguments to the Spark2 client. 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-10-2019
	
		
		08:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 On HDP3, SparkSQL API will directly query Spark2 own catalog namespace. The Spark catalog is independent of the Hive catalog. Hence, a HiveWarehouseConnector was developed to allow Spark users to query Hive data through the HiveWarehouseSessionAPI. Hive tables on HDP3 are ACID by default, given that Spark2 does not operate on ACID tables yet. To guarantee data integrity, the HiveWarehouseConnector will process queries through the HiveServer2Interactive (LLAP) service. This is not the case for External tables.  
   
 This video will explain how to access Hive from Spark2 on HDP3 along with some architectural changes and the support provided for particular use cases. 
   
  
 Open the video on YouTube here 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-10-2019
	
		
		08:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 This video describes an easy to use Python script to generate data for Hive, based on an input table schema. This data generator for Hive solves the issue of loading data into tables with a lot of columns (such as more than 1500 columns). This automation script supports faster testing of queries and analyzing performance.    To get the code, see the KB link (for customers only). 
   
 
    
 Open the video on YouTube here 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-10-2019
	
		
		08:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 This Video Describes how Kafka ACLs work in HDP. This method is not supported in CDP7, please investigate Ranger Authorization for ACLs in CDP. 
   
  
 Open the video on YouTube here 
   
 Apache Kafka comes with an authorizer implementation that uses ZooKeeper to store all the ACLs. The ACLs have to be set because the access to resources is limited to super users when an authorizer is configured. By default, if a resource has no associated ACLs, then no one is allowed to access the resource, except super users.    The following are the main ACL commands:    Add ACLs: 
 bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zkHost>:<zkPort> --add 
--allow-principal User:<username> --operation All --topic <topicName> --group=* 
 In the above command, ACLs are added to allow a principal to have All operations available over the topic specified. The following are the available operations: 
 
 Read 
 Write 
 Create 
 Delete 
 Alter 
 Describe 
 ClusterAction 
 DescribeConfigs 
 AlterConfigs 
 IdempotentWrite 
 All 
 
 When using --group=*, it means that all groups are allowed to be created by this user when running a Kafka consumer.    The following is the command to list ACLs: 
 bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zkHost>:<zkPort> --list 
 In the above command, the available ACLs are listed for the Kafka cluster using --list.    More details about ACLs options available in the following references: 
 
 Authorization and ACLs 
 ACLs command line interface 
 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-10-2019
	
		
		08:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 Many a times, it is necessary for a engineer/administrator to manipulate the content of Ambari-Infra-Solr using the command line utilities. They might or might not have access to the GUI interface.  
   
 This video helps to understand the basic manipulation of: 
 
 Listing collections and checking cluster status of Solr cloud. 
 Creating new collections. 
 Deleting the existing collections. 
 
   
  
 
 To check if ambari-infra-solr server instance is running on the node, run the following:
 # ps -elf | grep -i infra-solr
# netstat -plant | grep -i 8886 
 
 If the cluster is Kerberized. Check for valid kerberos tickets:
 # klist -A 
 
 Obtain a kerberos ticket, if not present:
 # kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab 
$(klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab 
|sed -n "4p"|cut -d ' ' -f7) 
 
 List SOLR collections:
 curl --negotiate -u : "http://$(hostname -f):8886/solr/admin/collections?action=list" 
 
 Create a collection:
 # curl --negotiate -u : "http://$(hostname -f):
8886/solr/admin/collections?action=CREATE&name=<collection_name>&numShards=<number>" 
 
 The following are the optional Values:
 &maxShardsPerNode=<number>
&replicationFactor=<number> 
 
 Delete a collection: 
 # curl --negotiate -u : "http://$(hostname -f):
8886/solr/admin/collections?action=DELETE&name=collection" 
 
 Check status of the Solr Cloud cluster:
 # curl --negotiate -u : "http://$(hostname -f):
8886/solr/admin/collections?action=clusterstatus&wt=json" | python -m json.tool 
 
 INDEX Keys:
 *solr_host = Host where solr instance(s) is running.
*collection = Name of collection.
*shard = Name of shard.
*action = CREATE ( to add a collection(s) )
*action = DELETE ( to delete a collection(s) )
*action = CLUSTERSTATUS ( to get the list of available collection(s) in the Solr cloud cluster ) 
 
 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-10-2019
	
		
		08:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This video describes how to upgrade Ambari 2.6.2.2 to Ambari 2.7.3. 
   
    
 
 Open the video on YouTube here 
   
 Apache Ambari 2.7.3 is the latest among Ambari 2.7.x releases. Ambari 2.7.0, which was the first release in the 2.7.x series introduced significant improvements from its predecessor - Ambari 2.6.2. This video will help users upgrade from Ambari 2.6.2.2 to Ambari 2.7.3. 
 Procedure 
 I. Prerequisites 
 
 Take a backup of the Ambari configuration file:
 # mkdir /root/backups
# cp /etc/ambari-server/conf/ambari.properties /root/backups 
 
 Turn off Service Auto Restart: 
 From Ambari UI: Admin > Service Auto Start. Set Auto Start Services to Disabled. Click Save. 
 
 Run Service Checks on all Ambari services. 
 On each of the Ambari services installed on the cluster, run Service Checks as the following: 
 From Ambari UI: <Service_Name> > Service Actions > Run Service Check 
    For example: HDFS > Service Actions > Run Service Check. 
 Start and Stop all of the Ambari services from Ambari UI. 
 
 II. Stop Services 
 
 If SmartSense is deployed, stop it and turn on Maintenance Mode. From Ambari Web, browse to Services > SmartSense and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu. 
  If Ambari Metrics is deployed, stop it and turn on Maintenance Mode. From Ambari Web, browse to Services > Log Search and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu. 
 If Log Search is run in the cluster, stop the service. From Ambari Web, browse to Services > Log Search and select Stop from the Service Actions menu. Then, select Turn on Maintenance Mode from the Service Actions menu. 
 Stop Ambari server:
 # ambari-server stop 
 
 Stop Ambari agents:
 # ambari-agent stop 
 
 Backup Ambari database:
 # mysqldump -u ambari -p ambari > /root/backups/ambari-before-upgrade.sql 
 
 
 III. Download Ambari 2.7.3 repository 
 1. Replace the old Ambari repository with the latest on on all hosts in the cluster 
 # wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.7.3.0/ambari.repo -O /etc/yum.repos.d/ambari.repo 
 2. Upgrade Ambari server 
 # yum clean all
# yum upgrade ambari-server 
 Note: If HDF components is deployed in the HDP setup, upgrade the HDF Management Pack before upgrading the Database Schema in step IV. For more details see HDF Upgrade the HDF Management Pack  3. Upgrade Ambari agents 
 # yum clean all
# yum upgrade ambari-agent 
 IV. Upgrade Database Schema 
 
 On the Ambari server host, upgrade the Ambari database schema:
 # ambari-server upgrade 
 
 Start Ambari server:
 # ambari-server start 
 
 Start Ambari agents:
 # ambari-agent start 
 
 
 V. Verify Ambari version 
 
 
 From the Ambari UI: Go to Admin > About:  
 
 
    
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-10-2019
	
		
		08:10 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 This video describes a step by step process for getting an HDP 3 cluster up and running on Centos 7. The video follows the Hortonworks Documentation and Support Matrix recommendations. Public repositories were used for a minimal two node install on CentOS 7.5. 
 
 Services installed on Ambari node: ambari-server 
 Services installed on node1: SmartSense, Ambari Metrics. 
 
  
 Open the video on YouTube here 
 Get ready 
 
 Clean yum cache yum clean all 
 Rebuild cache yum makecache  
 Install utilities yum install openssl openssh-clients curl unzip gzip tar wget 
 Double-check free RAM memory in the system free -m 
 Check limits configuration  ulimit -n -u 
 Set limits, temporarily  ulimit -n 32768 ulimit -u 65536 
 Set limits, permanently vim /etc/security/limits.conf root - nofile 32768 root - nproc 65536 
 Generate RSA SSH key ssh-keygen 
 Send public RSA key to node1 and configure it in the authorized keys file ssh-copy-id 10.200.82.41 
 Test passwordless connection ssh 10.200.82.41 
 Install NTP package yum install ntp -y 
 Edit NTP conf file for setting ISO code, as shown in the first column of the Strum Time Servers http://support.ntp.org/bin/view/Servers/StratumOneTimeServers vim /etc/ntp.conf 
 Start NTP service systemctl start ntpd 
 Check if the service is running systemctl status ntpd 
 Print the list of time servers the hosts are synchronizing with ntpq -p 
 Check the time drift between the hosts and an NTP server ntpdate -q 0.centos.pool.ntp.org 
 Set hostnames on the fly hostname ambari.local hostname node1.local 
 Edit the etc hosts file for setting the IP-names mapping vim /etc/hosts 10.200.82.40    ambari.local    ambari 10.200.82.41    node1.local    node1 
 Edit the OS network file for setting the permanent host name vim /etc/sysconfig/network  NETWORKING=yes HOSTNAME=ambari.local NETWORKING=yes HOSTNAME=node1.local  
 Run a new shell for getting the current host names to show in the promt bash 
 Double-check the output of hostname with and without f. They should be the same. hostname hostname -f 
 Disable firewalld while intalling systemctl disable firewalld 
 Stop firewall service service firewalld stop 
 Check if SELinux is currently in enforcing mode getenforce 
 Set it to permisive (or disabled) setenforce 0 
 Check if it switched modes getenforce 
 Download the public Apache Ambari repo file wget -nv http://public-repo1.hortonworks.com/ambari/centos7/2.x/updates/2.7.1.0/ambari.repo -O/etc/yum.repos.d/ambari.repo 
 List currently configured repositories yum repolist 
 
 Install Ambari Server 
 
 Install ambari-server package yum install ambari-server 
 Configure ambari-server ambari-server setup 
 Start the service ambari-server start 
 
 Deploy HDP cluster component 
 
 Browse to the Ambari Server user interface (UI). Default username and password are both admin http://ambari.local:8080/ 
 Take a look at the root user's private RSA file, the one generated before cat .ssh/id_rsa 
 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-10-2019
	
		
		08:03 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 From Ambari 2.6, for all MYSQL_SERVER components in a blueprint, the mysql-connector-java.jar needs to be manually installed and registered. This video describes how to install and register MySQL connector to replace the embedded database instance that is by default used by Ambari Server. 
   
  
   
 
 
 
 
 
 
 
 Open YouTube video here 
   
 
 
 
 
 
 
 
 For certain services, Cloudbreak allows registering an existing RDBMS instance as an external source for a database. After registering the RDBMS with Cloudbreak, it can be used for multiple clusters. However, as this configuration needs to be used by Ambari before its installation, MySQL Connector needs to be connected the remote MySQL database.   
    
 To manually install and register MySQL connector, do the following:  
   
 Preparing MySQL Database Server  
 
 Install MySQL Server on CentOS Linux 7:
 # yum -y localinstall https://dev.mysql.com/get/mysql57-community-release-el7-8.
noarch.rpm 
# yum -y install mysql-community-server 
# systemctl start mysqld.service
 
 
 Complete the MySQL initial setup. Depending on MySQL version, use user blank password for MySQL root or get the password from mysqld.log:
 # grep password /var/log/mysqld.log
# mysql_secure_installation 
 
 Create a user for Ambari, grant permissions and create the initial Database: 
 # mysql -u root -p
CREATE USER 'ambari'@'%' IDENTIFIED BY 'Hadoop1234!';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%';
CREATE USER 'ambari'@'localhost' IDENTIFIED BY 'Hadoop1234!';
GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'localhost';
FLUSH PRIVILEGES;
CREATE DATABASE ambari01; 
 
 
   
 Configure Cloudbreak to use MySQL External Database 
 
 Create a pre-ambari-start recipe to install the mysql-connector-java.jar:
 #!/bin/bash # Provide the JDBC Connector JAR file. 
# During cluster creation, Cloudbreak uses /opts/jdbc-drivers directory 
for the JAR file yum -y localinstall 
https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm yum -y 
install mysql-connector-java* if [[ ! -d /opt/jdbc-drivers ]] 
then mkdir /opt/jdbc-drivers cp /usr/share/java/mysql-connector-java.jar 
/opt/jdbc-drivers/mysql-connector-java.jar fi 
   
 Register the database configuration: Database:
 MySQL MySQL Server: MySQL_DB_IP/FQDN MySQL User: 
ambari MySQL Password: Hadoop1234! JDBC Connector JAR URL: 
Empty JDBC Connection jdbc:mysql://MySQL_DB_IP/FQDN:Port/ambari01 
   
 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-10-2019
	
		
		08:01 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 Sometimes, a node needs to be decommissioned or have undetermined downtime for repairs. If the node has a Resource Manager, move it to a new host using the Resource Manager Move Wizard from Ambari Web User Interface.The Resource Manager Move wizard describes the set of automated steps to be taken to move one Resource Manager to new host. Since YARN and Mapreduce2 will be restarted, a cluster maintenance windows must be planned and be prepared for cluster downtime. 
   
 This video describes how to move a Resource Manager to a new host using the Resource Manager Move Wizard from Ambari Web User Interface. 
   
  
 Open the video on YouTube here 
   
 To move YARN Resource Manager to a new host with Ambari, do the following:  
 
 In Ambari Web, browse to Services > YARN > Summary. 
 Select Service Actions and choose Move ResourceManager. The Move ResourceManager wizard launches, describing a set of automated steps that must be followed to move one ResourceManager to a new host. 
 Click Get Started. This wizard will provide a walk- through to move the ResourceManager. 
 The following services will be restarted as part of the wizard:
 You should plan a cluster maintenance window and prepare for cluster downtime when moving ResourceManager. 
 
 YARN 
 MAPREDUCE2. 
 
 
 Click Next 
 Select the Target Host. Assign ResourceManager to new host. Click next. 
 Review and confirm the host selections. 
 Expand YARN if necessary, to review all the configuration changes proposed for YARN. 
 Click Deploy to approve the changes and start automatically moving the  Resource Manager to a new host. 
 On Configure Components, click Complete when all the progress bars are completed. 
 After reloading the Ambari Web reloads, there will be some alerts. Wait a few minutes until all the services restart. 
 Restart any components using Ambari Web, if necessary. 
 
 REFERENCE: 
 http://docs.hortonworks.com (official product documentation)  http://community.hortonworks.com (community forum) 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-10-2019
	
		
		08:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 If an active Resource Manager in a cluster fails, to ensure that another Resource Manager is available, the Resource Manager high availability (HA) should be enabled and configured. 
   
 In HDP 2.2 or later environment, high availability (HA) can be configured for ResourceManager by using the Enable ResourceManager HA wizard. To ensure this, there must be at least three hosts in the cluster and Apache ZooKeeper servers should be running. 
   
 The Enable ResourceManager high availability section from the documentation contains the steps mentioned in this video. 
   
  
 Open the video on YouTube here    Recommended links: 
 
 Product documentation page 
 Community Forum 
 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels: