Created 07-08-2018 08:09 PM
we have ambari cluster version 2.6.0.x
we noitce that free memory on master01 machine is 0 ,
and that because many of the following process ( from ps -ef | grep java )
ambari-+ 65369 65322 0 Jul06 ? 00:01:52 /usr/jdk64/jdk1.8.0_112/bin/java -Xmx1024m -Dhdp.version=2.6.0.3-8 -Djava.net.preferIPv4Stack=true -Dhdvar/log/hadoop/ambari-qa -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.6.0.3-8/hadoop -Dhadoop.id.str=ambari-qa -Dhadoop.root.logger=INFO,c.6.0.3-8/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.0.3-8/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -.config.file=/usr/hdp/2.6.0.3-8/hive/conf/parquet-logging.properties -Dlog4j.configuration=beeline-log4j.properties -Dhadoop.security.logger=INFO,NullA/usr/hdp/2.6.0.3-8/hive/lib/hive-beeline-1.2.1000.2.6.0.3-8.jar org.apache.hive.beeline.BeeLine -u jdbc:hive2://master01.sys748.com:10000/;transportMo
any idea why all this process ( around 350 process was opened ? ) , and take all memory on master01 machine
[root@master01 ~]# ps -ef | grep java | wc -l 359 other way to show the process : ps -ef | sed 's/-D[^ ]*//g;s/-X[^ ]*//g;s#^.*/bin/java##g;s/[^ ]*.jar//g;s/^[ ]*//g' |more ambari-+ 50648 50646 0 Jul06 ? 00:00:00 -bash -c export PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/ *:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/lib/hive/bin /:/usr/sbin/' ; ! beeline -u 'jdbc:hive2://master01.sys748.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL'
Created 07-09-2018 07:00 AM
If you see that there are many "beeline" processes are being created by ambari or ambari-qa user then it might be the Hive Alert checker scripts which might be leaving opened Beeline connections ... which might be causing the issue.
Please see the script [1]
On Ambari Server host
/var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py
On Agent hosts
/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py
.
May be for some time you can try disabling the "HiveServer 2 Process" Alert from Ambari UI and then kill those beeling process manually using command line to see if it fixes the issue.
Ideally there were some old issues reported like that but those were already supposed to be fixed ambari 2.5.0 onwards like: https://issues.apache.org/jira/browse/AMBARI-18286
.
Created 07-09-2018 04:51 AM
Hi @Michael Bronson!
Could you share with us the output of these commands?
jstack -l <PID_HS2>
pstree -p hive
Hope this helps!
Created 07-09-2018 06:18 AM
not have the jstack command on my linux machine , should I download this cli from redhat repo?
Created 07-09-2018 07:00 AM
If you see that there are many "beeline" processes are being created by ambari or ambari-qa user then it might be the Hive Alert checker scripts which might be leaving opened Beeline connections ... which might be causing the issue.
Please see the script [1]
On Ambari Server host
/var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py
On Agent hosts
/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py
.
May be for some time you can try disabling the "HiveServer 2 Process" Alert from Ambari UI and then kill those beeling process manually using command line to see if it fixes the issue.
Ideally there were some old issues reported like that but those were already supposed to be fixed ambari 2.5.0 onwards like: https://issues.apache.org/jira/browse/AMBARI-18286
.
Created 07-09-2018 07:04 AM
Additionally please check if there is any issue in connecting to HiveSerevr2 using Beeline manually? This will give us some idea like how much time does the beeline connection takes. If it takes more than default hardcoded 30 secodns then please try to increase the timeout to a higher value like 60 in the same script:
# grep 'timeout=30' /var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py # grep 'timeout=30' /var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py
.