Member since
11-09-2016
68
Posts
16
Kudos Received
5
Solutions
07-24-2017
11:28 AM
In large clusters , sometimes restarting Namenode or a secondary namenode will fail and Ambari will keep trying multiple times then fails. One thing can be done quickly is to increase the timeouts of Ambari from 5s to 25s ( or up to 50s ) In /var/lib/ambari-server/resources/common-services/HDFS/XXX-VERSION-XXX/package/scripts/hdfs_namenode.py From this:
@retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail) To this:
@retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail) If it still fail, you can try
@retry(times=50, sleep_time=50, backoff_factor=2, err_class=Fail) One of the root causes of this maybe SOLR audit logs ( from Ambari Infra ) when creating huge logs that needs to be written to hdfs. Restart Ambari server You can clear the logs of NN and SNN here : /var/log/hadoop/hdfs/audit/solr/spool Becareful on deleting only on Standby NN - then do a failover to delete from the other server. do not delete logs while the namenode is active.
... View more
Labels:
07-24-2017
11:26 AM
In large clusters , sometimes restarting Namenode or a secondary namenode will fail and Ambari will keep trying mltiple times then fail. One thing can be done quickly is to increase the timeouts of Ambari from 5s to 25s. In /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py From this:
@retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail) To this:
@retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail) If it still fail, you can try
@retry(times=50, sleep_time=50, backoff_factor=2, err_class=Fail) One of the root causes of this maybe SOLR audit logs ( from Ambari Infra ) when creating huge logs that needs to be written to hdfs. You can clear the logs of NN and SNN here : /var/log/hadoop/hdfs/audit/solr/spool Becareful on deleting only on Standby NN - then do a failover to delete from the other server. do not delete logs while the namenode is active.
... View more
Labels:
07-14-2017
04:57 PM
By default, the file container-executor.cfg under /etc/hadoop/conf/ is overwritten in every nodemanager by /var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/templates/container-executor.cfg.j2 When you have LinuxContainerExecutor , yarn execute jobs as the end user, in this case It's not recommended to change banned.users and allowed.system.user Why you should ban super user from running Yarn jobs ? this is because anyone can run the job as "super-user" within hadoop group , hadoop trust what user you say you are when submitting jobs - if you pass kerberos wall with the keytab.( which can easily be found and used in the job ) - then any user can basically have full super user permissions on job submission.
... View more
Labels:
12-01-2017
03:20 AM
I am facing similar issue where standby NN is not starting. In the hdfs out file we are getting java.lang.OutOfMemoryError: Requested array size exceeds VM limit. Can we uncheck Audit to SOLR from Advance ranger audit and then start Standby NN. Will there be any impact on the cluster if we uncheck Audit to SOLR
... View more
05-07-2017
03:17 PM
HORTONWORKS : SCRIPT HOW TO DISABLE THP REDHAT 7 ? #!/bin/bash
### BEGIN INIT INFO
# Provides: disable-transparent-hugepages
# Required-Start: $local_fs
# Required-Stop:
# Author: Amine Hallam
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Disable Linux transparent huge pages
# Description: Disable Linux transparent huge pages, to improve
# database performance.
### END INIT INFO
case $1 in
start)
if [ -d /sys/kernel/mm/transparent_hugepage ]; then
thp_path=/sys/kernel/mm/transparent_hugepage
elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
thp_path=/sys/kernel/mm/redhat_transparent_hugepage
else
return 0
fi
echo 'never' > ${thp_path}/enabled
echo 'never' > ${thp_path}/defrag
re='^[0-1]+$'
if [[ $(cat ${thp_path}/khugepaged/defrag) =~ $re ]]
then
# RHEL 7
echo 0 > ${thp_path}/khugepaged/defrag
else
# RHEL 6
echo 'no' > ${thp_path}/khugepaged/defrag
fi
unset re
unset thp_path
;;
esac sudo chmod 755 /etc/init.d/disable-transparent-hugepages sudo chkconfig --add disable-transparent-hugepages Copied from here
... View more
05-04-2017
11:58 PM
1 Kudo
Please consider the following for this install; IBM Power servers are on centos 7 The install is performed by using a non-root user there is no access to internet or proxy to remote repos, we installed a local repos ------------------------------------------------------------prerequisite------------------------------------------------------------- #Check the Maximum
Open File Descriptors #The recommended
maximum number of open file descriptors is 10000, or more. To check
the current value set for the maximum number of open file
descriptors, execute the following shell commands on each host: ulimit -Sn ulimit -Hn #If the output is
not greater than 10000, run the following command to set it to a
suitable default: ulimit -n 10000 #SElinux sudo setenforce 0 sudo sh -c 'sudo sed
-i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config' sudo sh -c 'sudo sed
-i 's/SELINUX=permissive/SELINUX=disabled/g' /etc/selinux/config' sudo umask 0022 sudo echo umask 0022
>> /etc/profile ----------------------------- JDK - open JDK only
( oracle JDK not supported ) ------------------------------------------------- sudo yum install
java-1.8.0-openjdk sudo yum install
java-1.8.0-openjdk-devel sudo yum install
java-1.8.0-openjdk-headless export
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk vi ~/.bash_profile ---------------------------------- Installation of
Mysql / MariaDB on the server ---------------------------------- sudo yum update sudo yum install
mysql-connector-java sudo yum install
mysql sudo yum update sudo yum install
mariadb-server sudo systemctl
enable mariadb sudo systemctl start
mariadb #how to connect : mysql -u root -p #no password by
default ---------------------------------- Setting Up a Local
Repository for HDP on the server - No internet access --------------------------- sudo yum install
yum-utils createrepo sudo mkdir -p
/var/www/html/ -------------------------------- Prepare the httpd
service on the server ------------------------------- sudo yum install
httpd sudo service httpd
start sudo systemctl
enable httpd ---------------------------- prepare the repos ---------------------------------------- #HDP #Download from
http://public-repo-1.hortonworks.com/HDP/centos7-ppc/2.x/updates/2.6.0.3/HDP-2.6.0.3-centos7-ppc-rpm.tar.gz tar -xvf
HDP-2.6.0.3-centos7-rpm.tar.gz sudo mv HDP
/var/www/html/ cd
/var/www/html/HDP/centos7 #HDP-UTILS #Download from
http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/ppc64le/HDP-UTILS-1.1.0.21-centos7.tar.gz #Ambari #Download from
http://public-repo-1.hortonworks.com/ambari/centos7-ppc/2.x/updates/2.5.0.3/ambari-2.5.0.3-centos7-ppc.tar.gz ambari-2.5.0.3-centos7-ppc.tar.gz tar -xvf
ambari-2.5.0.3-centos7.tar.gz sudo mv ambari
/var/www/html/ cd
/var/www/html/ambari/centos7 ----------------------------------------------- HDP.repo Example ----------------------------------------------- #VERSION_NUMBER=2.6.0.3-8 [HDP-2.6.0.3] name=HDP Version -
HDP-2.6.0.3 #baseurl=http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.0.3 baseurl=http://XXXXXX/HDP/centos7-ppc/ gpgcheck=1 #gpgkey=http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.0.3/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins gpgkey=http://XXXXXX/HDP/centos7-ppc/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins enabled=1 priority=1 [HDP-UTILS-1.1.0.21] name=HDP-UTILS
Version - HDP-UTILS-1.1.0.21 baseurl=http://XXXXXX/HDP-UTILS-1.1.0.21/repos/ppc64le gpgcheck=0 gpgkey=http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.0.3/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins enabled=0 priority=1 ---------------------------------------------------- Prepare the the repo
files on the server XXXX --------------------------------------- sudo mv ambari.repo
/etc/yum.repos.d/ sudo mv hdp.repo
/etc/yum.repos.d/ #on ambari.repo
modify the following [ambari-2.5.0.3] name=ambari Version
- ambari-2.5.0.3 baseurl=http://XXXXXX/ambari/centos7-ppc/ gpgcheck=1 gpgkey=http://XXXX/ambari/centos7-ppc/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins enabled=1 priority=1 #Confirm that the
repository is configured by checking the repo list yum repolist sudo yum install
lambari-server sudo ambari-server
setup --jdbc-db=mysql
--jdbc-driver=/usr/share/java/mysql-connector-java.jar ambari-server setup
-j $JAVA_HOME ... Check more here
... View more
Labels:
02-24-2017
01:55 PM
3 Kudos
Authentification
1- Kerberos : Kerberos is mandatory for prod environments, you can either use your AD embeded kerberos or install a new dedicated KDC - Kerberos must be in HA Risk not doing the above : User impersonation for the services accounts ( jobs can be exported to run as super user permission ) 2 - Use a firewall to block all inbound traffic to the cluster- all sources / all ports except from the edge node ( Gateway ) Risk not doing the above : Passwords in the wrong hands will systematically give access to the cluster 3- Check the permissions of the keytabs, detailed this article here : Script to fix permissions and ownership of hadoop keytabs Risk not doing the above: utilisation by other cluster users
4- Use Knox for all API calls to the cluster. Benefits : inbound from "trusted" known machines, that requires authentification against an existing LDAP. Network
1 - The cluster must be in an isolated subnet - no interference with other networks, for security and thouroughput Risk not doing the above: Data interception by/from other machines in the data center. 2- Cluster machines can be linked internally on "non-routed" mode, and the config of the hosts via /etc/hosts in all machines. 3 - Flat Network is not recommended. Risk not doing the above : File inclusion attacks from other machines in the data center. 4- Possibilty of having two DNS resolutions ( internal and external ) is acceptable if the DNS server is HA Although you can combine /etc/hosts with DNS config.
5- IPtables must be disabled within the cluster This is pre-requisite for the installation
6 - /etc/hosts must be configured with the FQDN. Ambari server needs the resolution of all nodes in the cluster in its /etc/hosts This is pre-requisite for the installation Authorizations
1-Give systematically 000 permissions to HDFS files and folders of the data lake ( /data ) , only Ranger controls the access via policies Risk not doing the above: Users can access through ACLs and ignore Ranger policies 2 - You can use Umask : fs.permissions.umask-mode = 0022 Risk not doing the above: Wrong permissions, may lead to ranger policies being ignored. Other Best practices : Do not share the password of super users ( hdfs, hive, spark ... etc ) with all teams, only root should own it. You can disable connection ssh for some super users ( Knox, Spark ... etc ) Please feel free to comment for enhancements ..
... View more
Labels:
01-25-2017
10:24 AM
1 Kudo
bug (HIVE-15355) is being worked on by Hive engineering team at Hortonworks You can use the following workaround : 1- use "SORT BY 0" at the end of the query which will force to use one reducer , please use this only if you have a small query. 2- try to use set hive.mv.files.thread=0; before running the query. If you have any question regarding the above, please let me know.
... View more
Labels:
02-16-2017
07:39 PM
Does setting hive.mv.file.thread=0 reduce the performance of the insert query.Can you explain what does setting this configration has to do with HDP 2.5 upgrade
... View more
01-20-2017
05:44 PM
1 Kudo
During the upgrade :
if you are using postgres DB, you will face a script failure on the following statement
Alter table users add constraint "uniq_user_0"
Solution :
psql -U ambari_user -d NN_HA_ambari
alter table ambari.user drop constraint "uniq_user_0"
After the Upgrade :
Spark
1-you have to recompile and rebuild your jars using the new dependencies of Spark 1.6.2 Your pom.xml needs to have the new versions ( new jars of HDP 2.5.3 )
2-Update any custom jars and check all versions used in your Pom.xml
i.e
<spark.version>1.6.2</spark.version>
<hbase.version>1.2.1</hbase.version>
3- Remove the argument spark.yarn.jar if you are using it when submitting the job.
--conf spark.yarn.jar=hdfs://NN_HA/user/oozie/share/lib/lib_20160515013357/spark/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar.
Error noticed when leaving the arg above :
java.lang.IllegalArgumentException: Invalid ContainerId: container_e97_1484824056011_0038_02_000001
or
java.lang.IllegalArgumentException
Spark - Oozie
1- if you are launching Spark via Oozie ( or via Hue ) chek you get all the libs here with the right permissions :
-rwxr-xr-x 3 oozie hdfs 339666 2017-01-19 10:22 /user/oozie/share/lib/lib_20170119102144/spark/hbase-site.xml
-rwxr-xr-x 3 oozie hdfs 339666 2017-01-19 10:22 /user/oozie/share/lib/lib_20170119102144/spark/datanucleus-api-jdo-3.2.6.jar
-rwxr-xr-x 3 oozie hdfs 1890075 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/spark/datanucleus-core-3.2.10.jar
-rwxr-xr-x 3 oozie hdfs 1809447 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/spark/datanucleus-rdbms-3.2.9.jar
-rwxr-xr-x 3 oozie hdfs 22440 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/spark/oozie-sharelib-spark-4.2.0.2.5.3.0-37.jar
-rwxr-xr-x 3 oozie hdfs 44846 2017-01-19 10:22 /user/oozie/share/lib/lib_20170119102144/spark/py4j-0.9-src.zip
-rwxr-xr-x 3 oozie hdfs 357563 2017-01-19 10:22 /user/oozie/share/lib/lib_20170119102144/spark/pyspark.zip
-rwxr-xr-x 3 oozie hdfs 188897932 2017-01-19 10:22 /user/oozie/share/lib/lib_20170119102144/spark/spark-assembly-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar
-rwxr-xr-x 3 oozie hdfs 110488188 2017-01-19 17:43 /user/oozie/share/lib/lib_20170119102144/spark/spark-examples-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar
-rwxr-xr-x 3 oozie hdfs 188897932 2017-01-19 17:43 /user/oozie/share/lib/lib_20170119102144/spark/spark-hdp-assembly.jar
-rwxr-xr-x 3 oozie hdfs 516062 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/aws-java-sdk-core-1.10.6.jar
-rwxr-xr-x 3 oozie hdfs 258578 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/aws-java-sdk-kms-1.10.6.jar
-rwxr-xr-x 3 oozie hdfs 570101 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/aws-java-sdk-s3-1.10.6.jar
-rwxr-xr-x 3 oozie hdfs 10092 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/azure-keyvault-core-0.8.0.jar
-rwxr-xr-x 3 oozie hdfs 745325 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/azure-storage-4.2.0.jar
-rwxr-xr-x 3 oozie hdfs 434678 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/commons-lang3-3.4.jar
-rwxr-xr-x 3 oozie hdfs 1648200 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/guava-11.0.2.jar
-rwxr-xr-x 3 oozie hdfs 153855 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/hadoop-aws-2.7.3.2.5.3.0-37.jar
-rwxr-xr-x 3 oozie hdfs 163348 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/hadoop-azure-2.7.3.2.5.3.0-37.jar
-rwxr-xr-x 3 oozie hdfs 38605 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/jackson-annotations-2.4.0.jar
-rwxr-xr-x 3 oozie hdfs 225302 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/jackson-core-2.4.4.jar
-rwxr-xr-x 3 oozie hdfs 1076926 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/jackson-databind-2.4.4.jar
-rwxr-xr-x 3 oozie hdfs 570478 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/joda-time-2.1.jar
-rwxr-xr-x 3 oozie hdfs 16046 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/json-simple-1.1.jar
-rwxr-xr-x 3 oozie hdfs 12543 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/oozie-hadoop-utils-hadoop-2-4.2.0.2.5.3.0-37.jar
-rwxr-xr-x 3 oozie hdfs 51854 2017-01-19 10:21 /user/oozie/share/lib/lib_20170119102144/oozie/oozie-sharelib-oozie-4.2.0.2.5.3.0-37.jar
PS : here we have 755 can be restricted more
hdfs dfs -chmod -R 755 /user/oozie/share/lib/
2-once you have all the libs, please refresh Oozie to load the lib list.
su oozie
oozie admin -oozie http://localhost:11000/oozie -sharelibupdate
oozie admin -oozie http://localhost:11000/oozie -shareliblist spark*
Hive
1-If you don't install Atlas or you are not using it , you can can remore the hooks
Go to Hive -> Advanced -> General - replaced these by space
hive.exec.failure.hooks
hive.exec.post.hooks :
hive.exec.pre.hooks :
2-If you are using INSERT OVERWRITE queries and you are getting java.util.ConcurrentModificationException I have found a workaround :
add "SORT BY 0" to your query.
PS : This is a temporary solution to allow your prod jobs running.
3-allow impersonation for hcat
hadoop.proxyuser.hcat.hosts =*
If you have any question regarding the above, please let me know
more to follow up ...
... View more
- « Previous
-
- 1
- 2
- Next »