Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impalad Query Monitoring Error

avatar
Rising Star

Hello,

We have been receiving alerts for Impala [The health test result for IMPALAD_QUERY_MONITORING_STATUS has become bad: There are 0 error(s) seen monitoring executing queries, and 1 errors(s) seen monitoring completed queries for this role in the previous 5 minute(s). Critical threshold: any.]

I was following Cloudera Article ( https://my.cloudera.com/knowledge/Impala--IMPALADQUERYMONITORINGSTATUS-has-become-bad?id=75434) but am unsure if this would apply to our environment (our is CDH 5.16.2 / CH 5.16.2), also the 2 Apache Jira mentioned in this Article is resolved. Below is the log from one of our Impala daemon. I have also check and there are no failed queries (screenshot attached)

 

1.JPG

 

 

Current Query Monitoring Timeout = 1 minute / Query Monitoring Period = 1 minute.

 

Increasing the values as mentioned in the article would postpone the alerts, would like to know if there is a fix for this issue.

 

Impalad Logs

 

 

[root@NTPDQ cloudera-scm-agent]# tail -n 100  cloudera-scm-agent.log
    result, stdout, stderr = self._subprocess_with_timeout(args, self._timeout)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 38, in _subprocess_with_timeout
    return subprocess_with_timeout(args, timeout)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
    raise Exception("timeout with args %s" % args)
Exception: timeout with args ['ntpq', '-np']
[21/Nov/2019 17:01:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 6.90371108055
[21/Nov/2019 17:01:29 +0000] 5415 MainThread throttling_logger INFO     (14 skipped) Identified java component java8 with full version java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)  for requested version .
[21/Nov/2019 17:03:18 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 5.24657082558
[21/Nov/2019 17:05:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 6.19837498665
[21/Nov/2019 17:06:18 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 5.16436910629
[21/Nov/2019 17:07:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 6.74405503273
[21/Nov/2019 17:07:29 +0000] 5415 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.06 mean:0.10 max:0.25 LIFE_MAX:45.16
[21/Nov/2019 17:10:24 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 10.0939228535
[21/Nov/2019 17:11:21 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 7.39454698563
[21/Nov/2019 17:12:21 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 7.03050684929
[21/Nov/2019 17:13:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 5.31969809532
[21/Nov/2019 17:14:21 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 7.35476112366
[21/Nov/2019 17:15:27 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 13.7484540939
[21/Nov/2019 17:16:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 6.23533391953
[21/Nov/2019 17:17:22 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 23.2213561535
[21/Nov/2019 17:17:22 +0000] 5415 MainThread agent        WARNING  Delayed HB: 8s since last
[21/Nov/2019 17:17:37 +0000] 5415 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.05 mean:0.09 max:0.95 LIFE_MAX:45.16
[21/Nov/2019 17:18:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 11.9228010178
[21/Nov/2019 17:19:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 12.488642931
[21/Nov/2019 17:20:27 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 20.2072188854
[21/Nov/2019 17:20:27 +0000] 5415 MainThread agent        WARNING  Delayed HB: 5s since last
[21/Nov/2019 17:21:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 7.10305094719
[21/Nov/2019 17:22:22 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 9.03495693207
[21/Nov/2019 17:23:00 +0000] 5415 MonitorDaemon-Reporter throttling_logger INFO     (59 skipped) Descendants user CPU lower than expected for process 29048:           1419823.60001, 245602.28
[21/Nov/2019 17:23:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 7.17436099052
[21/Nov/2019 17:24:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 6.09477305412
[21/Nov/2019 17:25:22 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 8.74135708809
[21/Nov/2019 17:26:22 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 8.5287861824
[21/Nov/2019 17:27:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 6.24944400787
[21/Nov/2019 17:27:44 +0000] 5415 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.06 mean:0.07 max:0.11 LIFE_MAX:45.16
[21/Nov/2019 17:29:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 6.15325498581
[21/Nov/2019 17:30:22 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 8.27527880669
[21/Nov/2019 17:31:17 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 18.4421730042
[21/Nov/2019 17:31:17 +0000] 5415 MainThread agent        WARNING  Delayed HB: 3s since last
[21/Nov/2019 17:31:24 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 7.10764193535
[21/Nov/2019 17:32:22 +0000] 5415 MainThread throttling_logger INFO     (14 skipped) Identified java component java8 with full version java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)  for requested version .
[21/Nov/2019 17:32:22 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 19.3404970169
[21/Nov/2019 17:32:22 +0000] 5415 MainThread agent        WARNING  Delayed HB: 4s since last
[21/Nov/2019 17:33:21 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 14.3549408913
[21/Nov/2019 17:34:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 13.6162810326
[21/Nov/2019 17:35:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 13.1593289375
[21/Nov/2019 17:36:21 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 13.8171610832
[21/Nov/2019 17:37:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 12.1159389019
[21/Nov/2019 17:37:52 +0000] 5415 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.06 mean:0.09 max:1.04 LIFE_MAX:45.16
[21/Nov/2019 17:38:21 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 13.9521288872
[21/Nov/2019 17:39:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 12.0477230549
[21/Nov/2019 17:39:58 +0000] 5415 Monitor-HostMonitor throttling_logger ERROR    (2 skipped) Timeout with args ['ntpq', '-np']
None
[21/Nov/2019 17:39:58 +0000] 5415 Monitor-HostMonitor throttling_logger ERROR    (2 skipped) Failed to collect NTP metrics
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 48, in collect
    self.collect_ntpd()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 66, in collect_ntpd
    result, stdout, stderr = self._subprocess_with_timeout(args, self._timeout)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 38, in _subprocess_with_timeout
    return subprocess_with_timeout(args, timeout)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
    raise Exception("timeout with args %s" % args)
Exception: timeout with args ['ntpq', '-np']
[21/Nov/2019 17:40:29 +0000] 5415 ImpalaDaemonQueryMonitoring query_monitor ERROR    Error fetching completed queries from '/var/log/impalad/profiles/impala_profile_log'
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 901, in _get_completed_query_profiles
    self._query_monitor.get_completed_queries(query_log_file)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 581, in get_completed_queries
    completed_query_report_limit)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 456, in get_completed_queries
    last_accessed_file_timestamp)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 265, in _get_completed_queries
    for event in streamer.stream():
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/event_streamer.py", line 134, in stream
    file_list = self.__sort_file_list(self.__filtered_file_list, ascending)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/event_streamer.py", line 162, in __sort_file_list
    handle = f.open_log_file()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file.py", line 41, in open_log_file
    handle = open(self.__path)
IOError: [Errno 2] No such file or directory: '/var/log/impalad/profiles/impala_profile_log_1.1-1574308887326'
[21/Nov/2019 17:40:29 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 22.2102739811
[21/Nov/2019 17:40:29 +0000] 5415 MainThread agent        WARNING  Delayed HB: 7s since last
[21/Nov/2019 17:41:24 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 25.2082297802
[21/Nov/2019 17:41:24 +0000] 5415 MainThread agent        WARNING  Delayed HB: 10s since last
[21/Nov/2019 17:42:23 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 13.1406769753
[21/Nov/2019 17:43:21 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 11.081389904
[21/Nov/2019 17:44:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 10.6936099529
[21/Nov/2019 17:45:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 9.0398440361
[21/Nov/2019 17:46:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 9.3374710083
[21/Nov/2019 17:47:18 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 8.1221549511
[21/Nov/2019 17:47:55 +0000] 5415 MainThread heartbeat_tracker INFO     HB stats (seconds): num:39 LIFE_MIN:0.00 min:0.06 mean:0.08 max:0.24 LIFE_MAX:45.16
[21/Nov/2019 17:48:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 9.35701394081
[21/Nov/2019 17:49:17 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 7.75842308998
[21/Nov/2019 17:50:23 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 13.1954960823
[21/Nov/2019 17:51:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 9.66306400299
[21/Nov/2019 17:52:20 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 9.91945910454
[21/Nov/2019 17:53:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 9.45511198044
[21/Nov/2019 17:54:19 +0000] 5415 MainThread agent        WARNING  Long HB processing time: 9.34867501259

 

Thanks 

Amn

 

3 REPLIES 3

avatar
Expert Contributor

Hi Amn,

 

Your issue is different from which is addressed in the KB article. CM agent collects archived query profiles from Impala Daemon and saves to CM database periodically. According to your logs, the file /var/log/impalad/profiles/impala_profile_log_1.1-1574308887326 was gone when CM agent tried to open it:

[21/Nov/2019 17:39:58 +0000] 5415 Monitor-HostMonitor throttling_logger ERROR    (2 skipped) Failed to collect NTP metrics
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 48, in collect
    self.collect_ntpd()
  ...
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file.py", line 41, in open_log_file
    handle = open(self.__path)
IOError: [Errno 2] No such file or directory: '/var/log/impalad/profiles/impala_profile_log_1.1-1574308887326'

 I suggest you investigate why the file was gone. How often do you see this kind of error?

avatar
Rising Star

Hi Robbiez,

 

Strangely while checking on other Daemons I see different error, although the error received from CM is the same (The health test result for IMPALAD_QUERY_MONITORING_STATUS has become bad: There are 0 error(s) seen monitoring executing queries, and 1 errors(s) seen monitoring completed queries for this role in the previous 5 minute(s). Critical threshold: any. ) whereas the logs paint a different picture, the new log I see is attached and I see the same in other daemons also although the file numbers are a bit different.

[root@Server02 impalad]$ tail -n 100 impalad.Server02.impala.log.WARNING.20191120-115803.6341
        at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:804)
        at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:738)
        at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:485)
        at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:666)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:904)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:981)
        at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:147)
W1126 14:44:49.208544  6755 ShortCircuitCache.java:826] ShortCircuitCache(0x116b2697): could not load 1122093192_BP-196508081-172.18.208.227-1510815452859 due to InvalidToken exception.
Java exception follows:
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to /user/hive/warehouse/MYAPPLICATION.db/network/id=4/PRT_id=31268741116146977/part-00000_copy_293

 

Regards

Amn

avatar
Expert Contributor

Hi Amn,

 

The message was "1 errors(s) seen monitoring completed queries". It was reported by CM agent. You should be able to find error messages in detail from cloudera-scm-agent.log*.  The ShortCircuitCache message was from HDFS API. You can ignore it as it's a warning message.  HDFS API should retry and bypass ShortCircuitCache if ShortCircuitCache fails.

 

Regards,

Robbie