Reply
New Contributor
Posts: 4
Registered: ‎05-18-2016

ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

I'm using Impala 2.5.0 with CDH 5.7.1 parcels on RHEL 6.6/6.4 versions for my clusters and very often i could see " ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile" in cloudera scm agent logs and i've been receiving couple of alerts due to this. Could anyone help me to understand the reason for these logs.

 

I would like to know the impact of these alerts for the ongoing queries in the cluster if any?

 

Please find the logs from cloudera scm agent of the cluster.

 

-------------------------------------------------- 030---------------
[08/Aug/2016 05:42:05 +0000] 12396 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.02 min:0.03 mean:0.07 max:0.08 LIFE_MAX:0.15
[08/Aug/2016 05:45:25 +0000] 12396 ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile at 'http://sample030.enterprisenet.org:25000/query_profile_encoded'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/impalad/query_monitor.py", line 527, in get_executing_query_profile
password=password)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 66, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
raise URLError(err)
URLError: <urlopen error timed out>
[08/Aug/2016 05:45:30 +0000] 12396 ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query ids at 'http://sample030.enterprisenet.org:25000/inflight_query_ids'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/impalad/query_monitor.py", line 498, in get_executing_query_ids
password=password)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 66, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
raise URLError(err)
URLError: <urlopen error timed out>


-------------------------------------------------------------------------26----
[08/Aug/2016 05:41:54 +0000] 9923 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.02 min:0.03 mean:0.07 max:0.08 LIFE_MAX:0.18
[08/Aug/2016 05:45:22 +0000] 9923 ImpalaDaemonQueryMonitoring throttling_logger ERROR (1 skipped) Error fetching executing query ids at 'http://sample026.enterprisenet.org:25000/inflight_query_ids'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/impalad/query_monitor.py", line 498, in get_executing_query_ids
password=password)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 66, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
raise URLError(err)
URLError: <urlopen error timed out>


----------------------------------------33-----------
quested version 6.
[08/Aug/2016 05:44:18 +0000] 1564 MainThread throttling_logger INFO (14 skipped) Identified java component java7 with full version JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera java version "1.7.0_67" Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) for requested version 7.
[08/Aug/2016 05:45:07 +0000] 1564 ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile at 'http://sample033.enterprisenet.org:25000/query_profile_encoded'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/impalad/query_monitor.py", line 527, in get_executing_query_profile
password=password)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 66, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
raise URLError(err)
URLError: <urlopen error timed out>


--------------------------------------------------------20--------------
[08/Aug/2016 05:45:02 +0000] 34729 ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query ids at 'http://sample020.enterprisenet.org:25000/inflight_query_ids'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/impalad/query_monitor.py", line 498, in get_executing_query_ids
password=password)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 66, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
raise URLError(err)
URLError: <urlopen error timed out>

New Contributor
Posts: 4
Registered: ‎05-18-2016

Re: ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

Any help would be much appreciated. Thanks

Explorer
Posts: 18
Registered: ‎10-03-2015

Re: ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

Hi,

thanks for your message.

We unfortunately experience the same issue on our cluster (centos 6.7 nodes, cm+cdh 5.8.2). We don't even use impala, yet a lot of these errors appear. An example:

 

[12/Nov/2016 08:28:28 +0000] 79331 ImpalaDaemonQueryMonitoring throttling_logger ERROR    Error fetching executing query ids at 'http://<blablabla>:25000/inflight_query_ids'

I think I can only observe them when the node is under super high pressure (cpu + mem), so I am thinking about the web server running out of resources. But I am not sure how to check this.

Do you have any further insight on this? Can we brainstorm a bit on how to move forward? :)

Thanks and regards,

Filippo

 

 

Explorer
Posts: 18
Registered: ‎10-03-2015

Re: ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

Hi,

forgot to mention that a (possibly?) related discussion is visible here https://issues.cloudera.org/browse/IMPALA-3180

 

Regards,

Filippo

Cloudera Employee
Posts: 275
Registered: ‎07-29-2015

Re: ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

The Impala web server may just be slow to respond - there are some cases where it can slow down under load. Are you seeing any other adverse effects?

Explorer
Posts: 18
Registered: ‎10-03-2015

Re: ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

Hi,

thanks a lot for picking this up, any new thought is very appreciated! :)

 

Let me share some more details of a concrete example first. Yesterday we got notifications for IMPALAD_QUERY_MONITORING_STATUS from 3 nodes, almost simultaneously like it typically happens. From one of them we also got NODE_MANAGER_HOST_HEALTH and IMPALAD_HOST_HEALTH. When checking the status on the node, I saw 

HOST_SCM_HEALTH ,

IMPALAD_HOST_HEALTH ,

DATA_NODE_HOST_HEALTH and

NODE_MANAGER_HOST_HEALTH were all bad, because of "the following health tests are bad: agent status" and "This host is not in contact with the Host Monitor".

 

Checking the agent log only gave the error we are discussing in this thread.

 

From the point of view of resource utilization (cpu+mem figures), in 2 of the nodes (the ones raising only the IMPALAD_QUERY_MONITORING_STATUS alarm) figures were very high, whereas in the more "noisy" node levels were acceptable (I can share some pictures if it helps).

 

Given all this:

- How to make sure it's indeed the web server being in trouble? Which performance metric / graph / whatever to look at?

- How does this issue fit with having NO impala load whatsoever?

- Why does the issue seem so nondeterministic wrt node load?

 

Sorry for the non-quantitative facts and questions I am sharing, but right now I have some hard time being more precise!

 

Regards,

Filippo

Cloudera Employee
Posts: 25
Registered: ‎12-11-2015

Re: ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

- How to make sure it's indeed the web server being in trouble? Which performance metric / graph / whatever to look at?

 

The impala query monitoring health test goes as follows

 

impalad will publish most of its status using its web ui. one of which is inflight_query_ids.

 

http://<impalad_hostname>:25000/inflight_query_ids

 

This page lists the inflight queries and the scm-agent curls this web page to monitor the status of queries. Sometimes when the webserver is slow to respond and the request to url will timeout -  the IMPALAD_QUERY_MONITORING_STATUS goes bad.

 

You will be able to notice URLError: <urlopen error timed out> for weburi inflight_query_ids on agent logs

 

 

Explorer
Posts: 18
Registered: ‎10-03-2015

Re: ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

Hi,

thanks for your message. What you say makes sense to me. I think the natural next step is understanding why sometimes the webserver becomes slow. What's the best way for figuring this out? I thought it might have correlated with cpu+mem utilization, but I am not sure this is the case.

I think it would be very good to find the root cause of this "becoming slow every now and then", and either address them, or make the timeout less aggressive.

Regards,

Filippo

New Contributor
Posts: 4
Registered: ‎05-18-2016

Re: ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

Thanks for your inputs on this issue.

 

So based on this conversation i could make out that it wouldn't have any IMPACT to the ongoing/inflight IMPALA queries and it would happen when cloudera scm agent failed to get proper response in SET time limit from IMPALA daemon webserver about inflight queries.

 

We could avoid getting these messages/alerts by just increasing the timeout limit set on cloudera scm agent?

 

Thanks,

Suresh Pala

Explorer
Posts: 13
Registered: ‎04-18-2016

Re: ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching executing query profile

I guess there is an issue with your network.

check on server average load: sar -q and sar -p 

Also check if there is any unknown ip  in secure log

 

Announcements