Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Super Guru

SYMPTOM: After Upgrading ambari from 1.7.0 to ambari 2.2.1.1 there are lots of alerts with respect to HIVE

ALERTS Example:

 ExecuteTimeoutException: Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/sbin/:/usr/lib/hive/bin'"'"' ; export HIVE_CONF_DIR='"'"'/etc/hive/conf.server'"'"' ; hive --hiveconf hive.metastore.uris=thrift://host1:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e '"'"'show databases;'"'"''' was killed due timeout after 60 seconds 
) 
2016-05-11 03:25:04,779 [CRITICAL] [HIVE] [hive_server_process] (HiveServer2 Process) Connection failed on host host1:10000 (Traceback (most recent call last): 
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 200, in execute 
check_command_timeout=int(check_command_timeout)) 
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 68, in check_thrift_port_sasl 
timeout=check_command_timeout 
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ 
self.env.run() 
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run 
self.run_action(resource, action) 
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action 
provider_action() 
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run 
tries=self.resource.tries, try_sleep=self.resource.try_sleep) 
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner 
result = function(command, **kwargs) 
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call 
tries=tries, try_sleep=try_sleep) 
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper 
result = _call(command, **kwargs_copy) 
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 285, in _call 
raise ExecuteTimeoutException(err_msg) 
ExecuteTimeoutException: Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/lib/hive/bin/:/usr/sbin/'"'"' ; ! beeline -u '"'"'jdbc:hive2://host1:10000/;transportMode=binary'"'"' -e '"'"''"'"' 2>&1| awk '"'"'{print}'"'"'|grep -i -e '"'"'Connection refused'"'"' -e '"'"'Invalid URL'"'"''' was killed due timeout after 60 seconds 
) 
2016-05-11 03:34:01,826 [OK] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore OK - Hive command took 4.830s 
2016-05-11 03:34:01,826 [OK] [HIVE] [hive_server_process] (HiveServer2 Process) TCP OK - 1.549s response on port 10000 


ROOT CAUSE: Hive connection was taking long time to respond back.

This is suspected to be a bug - https://hortonworks.jira.com/browse/BUG-47724

RESOLUTION: Workaround is to modified the value for "check.command.timeout" HIVE metastore alert definition. Please check the link for detailed steps - https://community.hortonworks.com/articles/33564/how-to-modify-ambari-alert-using-postput-action.htm...

From - 
"value" : "60.0" 

To - 
"value" : "120.0" 
1,607 Views