Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

spark thrift server alert

I am getting critical alert for spark thrift server.

alert description : This host-level alert is triggered if the Spark Thrift Server cannot be determined to be up.

Connection failed on host (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/SPARK/1.2.1/package/scripts/alerts/alert_spark_thrift_port.py", line 143, in execute Execute(cmd, user=hiveruser, path=[beeline_cmd], timeout=CHECK_COMMAND_TIMEOUT_DEFAULT) File "/usr/lib/python2.6/site-packages/resource_management/core/...

I also did check log in the location /var/log/spark/org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-host.out

but I could not see any Error there ,

Note : Even tough there is alert for spark thrift server but Spark thrift server is alive and running ? I do not understand why i am getting this critical alert for spark thrift server ?

6 REPLIES 6

Super Mentor

@Anurag Mishra

Can you please share the complete stackTrace of the error that you are getting :

Following is incomplete:

Connection failed on host (Traceback (most recent call last): File 
"/var/lib/ambari-agent/cache/common-services/SPARK/1.2.1/package/scripts/alerts/alert_spark_thrift_port.py", line 143, in execute Execute(cmd, user=hiveruser, path=[beeline_cmd], 
timeout=CHECK_COMMAND_TIMEOUT_DEFAULT) File "/usr/lib/python2.6/site-packages/resource_management/core/...

.

@Jay Kumar SenSharma

Hi jay can you suggest some rest api or other way to get complete stack trace, I have mentioned these details just clicking on alert in ambari , this much detail I can see there when i hover mouse over these lines on black pop-up screen appears but it is difficult to provide details form there ?

@Jay Kumar SenSharma

i can see below line in strack trace :

jdbc:hive2://hostname:10000/;principal=hive/_HOST@PRINCIPAL.COM """ transparentMode = binary -e === 2&1|awk """{print}"""|grep -i -e """connection refused""" -e ""Invalid URL"""

is this something creating problem ?

Hi @Jay Kumar SenSharma

please find below full stack trace :

Connection failed on host Host_name:10015 (Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/SPARK/1.2.1/package/scripts/alerts/alert_spark_thrift_port.py", line 143, in execute
    Execute(cmd, user=hiveruser, path=[beeline_cmd], timeout=CHECK_COMMAND_TIMEOUT_DEFAULT)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 297, in _call
    raise ExecuteTimeoutException(err_msg)
ExecuteTimeoutException: Execution of 'ambari-sudo.sh su hive -l -s /bin/bash -c 'export  PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/dell/srvadmin/bin:/usr/lib/jvm/java-1.8.0-openjdk/bin:/home/hdpadmin/.local/bin:/home/hdpadmin/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/spark-client/bin/beeline'"'"' ; ! beeline -u '"'"'jdbc:hive2://host.com:10015/default;principal=hive/Host'"'"' transportMode=binary  -e '"'"''"'"' 2>&1| awk '"'"'{print}'"'"'|grep -i -e '"'"'Connection refused'"'"' -e '"'"'Invalid URL'"'"''' was killed due timeout after 60.0 seconds

New Contributor

Any news about this post, we met the same problem recently and our spark Thrift server is not running because of it. Ambari indicate it as green but the red alert show us that we can't connect on it using beeline. Spark2 thrift server is working instead.

Mentor

@Florian SILVA

I would advice you to open a new thread because the one you are responding to date to January 2018.
Conseil d'ami

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.