Created on 02-13-201604:50 PM - edited on 02-14-202008:44 AM by VidyaSargur
This article is for those who want a cheat sheet for a smooth installation of HDP in a Dev, or Test with one or more of the following requirements:
Place all the log data into a different directory, not /var/log
All your service user names must be prefixed with the cluster name. The requirement is that these users must be centrally managed by AD or an LDAP.
You do not have any local users in the Hadoop cluster, including Hadoop service users. This becomes important if you wish to have Centrify deployed also, or if you would be deploying multiple clusters with a single LDAP/ AD integration. Once again, these service names should have a cluster-prefix.
You want to set appropriate YARN, Tez and MapReduce, Amabri Metrics Memory Parameters during Install.
Side Note: It is always prudent to get Professional Service assistance to either install or configure your production deployment, to make sure all the per-requisites, unique to your environment are covered and met.
Step 5: Create Service Accounts Beforehand in your LDAP
Decide what you rcluster prefix would be. Do not put an underscore "_" or a hyhen "-" in your prefix.
The list of service accounts you need to create are located here.
Solr is missing from the list. You need this user if you want to install Ranger, for Ranger uses Solr from HDP 2.3 and above for auditing and to show audit events in the UI.
Create a solr user with default group solr, with membership in the hadoop group also.
IMPORTANT: On each node, get the AD or LDAP UID for hdfs, and group hadoop; edit the /etc/passwd and /etc/groups and add the users there with the CORRECT UID fom AD or Ldap. I have found that even though you choose the option to
Skip Group Modifications to not modify the Linux groups in the cluster, and you tell Ambari to do not Manage HDFS, some of the yum installs still tries to create the, Ambari would respect your wishes but not yum.
Make sure the entries in your /etc/passwd and /etc/groups have your cluster prefix.
When you install through Ambari it is very important that you config the right properties so that Ambari is aware of your centrally managed cluster-prefixed service names:
Set Skip Group Modification
Tell Ambari DO not Manage HDFS
Step 6: Use Hortonworks Handy Scripts to Automatically Prepare the Environment Across all Nodes
So you have your disk partitions, your network is setup, you have decided on your master services placement, you have created the service names in LDAP with a cluster prefix, you have edited your /etc/passwd and /etc/groups.
Here comes the fun part.
Go to your Ambari node and perform the following:
# Install Hortonworks Public Tools
> yum install wget
> wget –qO- --no-check-certificate https://github.com/hortonworks/HDP-Public-Utilities/raw/master/Installation/install_tools.sh | bash
#Everything will be installed to /root/hdp; create the /root/hdp/Hostdetail.txt file with all the hostnames for your cluster.
# Hostname –f > /root/hdp/Hostdetail.txt
#To set up Password-less SSH
>chmod 700 ~/.ssh
>chmod 600 ~/.ssh/id_rsa
# Distribute the keys to other nodes. The copy command is needed because the ./distribute_ssh_keys.sh script thinks the private key is at /tmp/ec2_keypair. Else if you set up your nodes with a root passwrd, when prompted by the script, just enter it.
> cp <your nodes private key> /tmp/ec2_keypair
> ./distribute_ssh_keys.sh ~/.ssh/id_rsa.pub
#Optional: Copy the private key to all nodes if you want password less ssh from any node to any node. Don't do this, if you only want password-less ssh ONLY from the Ambari Node. Password-less ssh is only needed for Ambari to install Agents on all nodes, else without it you need to install the Agents and configure them yourself.
>./copy_file ~/.ssh/id_rsa ~/.ssh/id_rsa
# Test passwordless SSH
> ssh <node>
#Now run a script to set all the OS pre-requisites for a cluster install. You may have to edit ./run_command.sh and add to the ssh command, ssh -tty, since the ./hdp_preinstall.sh script has sudo commands in it.
> ./run_command.sh 'mkdir /root/hdp'
> ./copy_file.sh /root/hdp/hdp_preinstall.sh /root/hdp/hdp_preinstall.sh
> vi run_command.sh (add "-tty" to the ssh call)
# Now in one swoop set the OS parameters
> ./run_command.sh './root/hdp/hdp_preinstall.sh'
REBOOT ALL NODES
#DOUBLE CHECK That all the Nodes retain all the OS Environment Configuration Changes for HDP Install
> ./pre_install_check.sh | tee report.txt
#View the report. Ignore the Repo warnings for Ambari and HDP, if you are connected to internet and you will pull the repos from there duing install.
> vi report.txt
# Now get your YARN Parameters to use when you install the cluster via Ambari
# Download Hortonworks Companion files
> wget http://public-repo-1.hortonworks.com/HDP/tools/18.104.22.168/hdp_manual_install_rpm_helper_files-22.214.171.124.3...
> tar -zxvf hdp_manual_install_rpm_helper_files-126.96.36.199.3485.tar.gz
> cd hortonworks-HDP-Public-Utilities-d617f44
# Now run the Script to determine your memory parameters that you would set in Ambari during the Customize Services Step. Put your Number of Cores (c), Memory per Node (m), Disks per Node for HDFS (d) and Whether HBase will be installed or not (-k) into the python call
>python yarn-utils.py -c 16 -m 64 -d 4 -k True
Step 8 Install SmartSense, only offered by Hortonworks.
Finally INSTALL SMARTSENSE, if you are a Hortonworks Customer. If you are not, why NOT? You are missing all the value from SmartSense to auto tune your cluster. (In Ambari 2.2 it is available as a Service.)
Most issues are due to a rouge process running having a local uid and not the LDAP, AD UID, so double check using ps -ef. If you set up your /etc/psswd and /etc/group properly before hand, you should not have this issue.
Some issues come up if your files and/ or logs are owned by the local hdfs user. Again if you did not choose the 'Skip Group Modification' option, and told Ambari to not manage HDFS, or set the hdfs user properly during install to the <cluster-prefix>-hdfs, or setup your /etc/psswd and /etc/group you would get this problem.
Remember some yum installs do not care what you set in Ambari for the hdfs user, so you may have to run those manually, so look out for that.