Created 12-23-2015 10:54 AM
Hi,
After restarting the cluster randomly we see couple of red alerts in Ambari. Earlier also I remember seeing them sometime back and now again I see them again.
Can you suggest what could be going wrong?
I checked the ports 10001 and 9083 are open/in use.
HiveHive Metastore Process
====================================
Connection failed on host HIVE_HOST:10001 (Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/lib/hive/bin/:/usr/sbin/'"'"' ; ! beeline -u '"'"'jdbc:hive2://HIVE_HOST:10001/;transportMode=http;httpPath=cliservice;principal=hive/_HOST@REALM.COM'"'"' -e '"'"''"'"' 2>&1| awk '"'"'{print}'"'"'|grep -i -e '"'"'Connection refused'"'"' -e '"'"'Invalid URL'"'"''' was killed due timeout after 30 seconds)
HiveHiveServer2 Process
=============================
Metastore on HIVE_HOST failed (Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/sbin/:/usr/hdp/current/hive-metastore/bin'"'"' ; export HIVE_CONF_DIR='"'"'/usr/hdp/current/hive-metastore/conf/conf.server'"'"' ; hive --hiveconf hive.metastore.uris=thrift://HIVE_HOST:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e '"'"'show databases;'"'"''' was killed due timeout after 30 seconds)
Created 12-23-2015 11:19 AM
@Darpan Patel I have seen this behavior during the network hicups or when cluster is overloaded.
Created 12-23-2015 11:19 AM
@Darpan Patel I have seen this behavior during the network hicups or when cluster is overloaded.
Created 12-23-2015 12:02 PM
Well but network is stable here. And no jobs are running on the cluster!
Created 12-23-2015 12:07 PM
@Darpan Patel I recommend opening a support case and let support take a look.
Created 12-23-2015 12:19 PM
Alright Neeraj. @Neeraj Sabharwal
Thank you again 🙂
Created 02-12-2016 03:51 PM
Hi did you ever get an answer to the time out errors? Is there a way to extend the timeout to allow them to finish?
Created 09-23-2016 09:01 AM
Also I had meet the second problem? How can I update the connect timeout setting? There were test environment not enough resources so I don't mind it.