Created 07-26-2018 09:29 AM
Hi everyone,
In my cluster i am getting the alert on hive server 2 process connection failed.but the hive server2 is running.
please find the log below
Connection failed on host abc.covert.com:10000 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 211, in execute ldap_password=ldap_password) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 79, in check_thrift_port_sasl timeout=check_command_timeout) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_ kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 297, in _call raise ExecuteTimeoutException(err_ msg) ExecuteTimeoutException: Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/lib/hive/bin/:/usr/sbin/'"'"' ; ! beeline -u '"'"'jdbc:hive2://abc.covert.com:10000/;transportMode=binary;principal=hive/_HOST@COVERT.NET'"'"' -e '"'"''"'"' 2>&1| awk '"'"'{print}'"'"'|grep -i -e '"'"'Connection refused'"'"' -e '"'"'Invalid URL'"'"''' was killed due timeout after 60 seconds )
can you please help me how to get rid of this
Thanks in advance
Created 07-26-2018 09:43 AM
Hi @kanna k ,
as per my understanding its failing with connecting to beeling. can you manually try to execute this command from ambari-server and see if it's working?
IMHO this might be becuase connection to beeline is hung. restarting hive server might help.
Created 07-26-2018 11:11 AM
Since it is a production cluster. does it effect to other services if i restart hive2 server
Created 07-26-2018 12:59 PM
It's a timeout problem. The alert is giving the beeline command 60 seconds to spin up a JVM and connect to Hive. You can always go to the Alert's definition in the Ambari UI and change this timeout property to something higher (like 75 seconds). However, before you do that, you might want to run the command yourself and see how long it takes. If it's taking more than a minute, that could indicate a problem with resources on this host.
Created 07-27-2018 06:54 PM
Ambari schedules a connection check to HS2 every 2 minutes.
Beeline client is executed to connect to the HS2 JDBC string with a timeout on client side. IF beeline is not able to connect withing the timeout period, the beeline process is killed.
This shows your HS2 is not configured properly. As it is taking more than 60 seconds to connect which is quite unusual. HS2 should connect with 4 - 30 seconds max, for it be usable by external tools like Power BI, alation, Ambari Hive Views.
Please follow the following flogs in details to know more about how to debug the issue.
https://community.hortonworks.com/content/kbentry/202485/hiveserver2-configurations-deep-dive.html
Some steps to know where is the bottelneck
1. Enable debug mode for Beeline
2. Execute beeline with the HS2 JDBC string, it will provide a detailed view of time to connect to AD, Zookeeper, HS2 and mysql
3. Tune HS2 parameters (Tez Session at Initilization = true , disable database scan for each connection= true)
4. Bump up heap of HS2
Blog has all that you need