Created 11-20-2019 10:22 PM
Hello,
We have been receiving alerts for Impala [The health test result for IMPALAD_QUERY_MONITORING_STATUS has become bad: There are 0 error(s) seen monitoring executing queries, and 1 errors(s) seen monitoring completed queries for this role in the previous 5 minute(s). Critical threshold: any.]
I was following Cloudera Article ( https://my.cloudera.com/knowledge/Impala--IMPALADQUERYMONITORINGSTATUS-has-become-bad?id=75434) but am unsure if this would apply to our environment (our is CDH 5.16.2 / CH 5.16.2), also the 2 Apache Jira mentioned in this Article is resolved. Below is the log from one of our Impala daemon. I have also check and there are no failed queries (screenshot attached)
Current Query Monitoring Timeout = 1 minute / Query Monitoring Period = 1 minute.
Increasing the values as mentioned in the article would postpone the alerts, would like to know if there is a fix for this issue.
Impalad Logs
[root@NTPDQ cloudera-scm-agent]# tail -n 100 cloudera-scm-agent.log
result, stdout, stderr = self._subprocess_with_timeout(args, self._timeout)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 38, in _subprocess_with_timeout
return subprocess_with_timeout(args, timeout)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
raise Exception("timeout with args %s" % args)
Exception: timeout with args ['ntpq', '-np']
[21/Nov/2019 17:01:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 6.90371108055
[21/Nov/2019 17:01:29 +0000] 5415 MainThread throttling_logger INFO (14 skipped) Identified java component java8 with full version java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode) for requested version .
[21/Nov/2019 17:03:18 +0000] 5415 MainThread agent WARNING Long HB processing time: 5.24657082558
[21/Nov/2019 17:05:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 6.19837498665
[21/Nov/2019 17:06:18 +0000] 5415 MainThread agent WARNING Long HB processing time: 5.16436910629
[21/Nov/2019 17:07:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 6.74405503273
[21/Nov/2019 17:07:29 +0000] 5415 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.06 mean:0.10 max:0.25 LIFE_MAX:45.16
[21/Nov/2019 17:10:24 +0000] 5415 MainThread agent WARNING Long HB processing time: 10.0939228535
[21/Nov/2019 17:11:21 +0000] 5415 MainThread agent WARNING Long HB processing time: 7.39454698563
[21/Nov/2019 17:12:21 +0000] 5415 MainThread agent WARNING Long HB processing time: 7.03050684929
[21/Nov/2019 17:13:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 5.31969809532
[21/Nov/2019 17:14:21 +0000] 5415 MainThread agent WARNING Long HB processing time: 7.35476112366
[21/Nov/2019 17:15:27 +0000] 5415 MainThread agent WARNING Long HB processing time: 13.7484540939
[21/Nov/2019 17:16:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 6.23533391953
[21/Nov/2019 17:17:22 +0000] 5415 MainThread agent WARNING Long HB processing time: 23.2213561535
[21/Nov/2019 17:17:22 +0000] 5415 MainThread agent WARNING Delayed HB: 8s since last
[21/Nov/2019 17:17:37 +0000] 5415 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.05 mean:0.09 max:0.95 LIFE_MAX:45.16
[21/Nov/2019 17:18:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 11.9228010178
[21/Nov/2019 17:19:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 12.488642931
[21/Nov/2019 17:20:27 +0000] 5415 MainThread agent WARNING Long HB processing time: 20.2072188854
[21/Nov/2019 17:20:27 +0000] 5415 MainThread agent WARNING Delayed HB: 5s since last
[21/Nov/2019 17:21:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 7.10305094719
[21/Nov/2019 17:22:22 +0000] 5415 MainThread agent WARNING Long HB processing time: 9.03495693207
[21/Nov/2019 17:23:00 +0000] 5415 MonitorDaemon-Reporter throttling_logger INFO (59 skipped) Descendants user CPU lower than expected for process 29048: 1419823.60001, 245602.28
[21/Nov/2019 17:23:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 7.17436099052
[21/Nov/2019 17:24:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 6.09477305412
[21/Nov/2019 17:25:22 +0000] 5415 MainThread agent WARNING Long HB processing time: 8.74135708809
[21/Nov/2019 17:26:22 +0000] 5415 MainThread agent WARNING Long HB processing time: 8.5287861824
[21/Nov/2019 17:27:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 6.24944400787
[21/Nov/2019 17:27:44 +0000] 5415 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.06 mean:0.07 max:0.11 LIFE_MAX:45.16
[21/Nov/2019 17:29:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 6.15325498581
[21/Nov/2019 17:30:22 +0000] 5415 MainThread agent WARNING Long HB processing time: 8.27527880669
[21/Nov/2019 17:31:17 +0000] 5415 MainThread agent WARNING Long HB processing time: 18.4421730042
[21/Nov/2019 17:31:17 +0000] 5415 MainThread agent WARNING Delayed HB: 3s since last
[21/Nov/2019 17:31:24 +0000] 5415 MainThread agent WARNING Long HB processing time: 7.10764193535
[21/Nov/2019 17:32:22 +0000] 5415 MainThread throttling_logger INFO (14 skipped) Identified java component java8 with full version java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode) for requested version .
[21/Nov/2019 17:32:22 +0000] 5415 MainThread agent WARNING Long HB processing time: 19.3404970169
[21/Nov/2019 17:32:22 +0000] 5415 MainThread agent WARNING Delayed HB: 4s since last
[21/Nov/2019 17:33:21 +0000] 5415 MainThread agent WARNING Long HB processing time: 14.3549408913
[21/Nov/2019 17:34:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 13.6162810326
[21/Nov/2019 17:35:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 13.1593289375
[21/Nov/2019 17:36:21 +0000] 5415 MainThread agent WARNING Long HB processing time: 13.8171610832
[21/Nov/2019 17:37:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 12.1159389019
[21/Nov/2019 17:37:52 +0000] 5415 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.06 mean:0.09 max:1.04 LIFE_MAX:45.16
[21/Nov/2019 17:38:21 +0000] 5415 MainThread agent WARNING Long HB processing time: 13.9521288872
[21/Nov/2019 17:39:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 12.0477230549
[21/Nov/2019 17:39:58 +0000] 5415 Monitor-HostMonitor throttling_logger ERROR (2 skipped) Timeout with args ['ntpq', '-np']
None
[21/Nov/2019 17:39:58 +0000] 5415 Monitor-HostMonitor throttling_logger ERROR (2 skipped) Failed to collect NTP metrics
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 48, in collect
self.collect_ntpd()
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 66, in collect_ntpd
result, stdout, stderr = self._subprocess_with_timeout(args, self._timeout)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 38, in _subprocess_with_timeout
return subprocess_with_timeout(args, timeout)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
raise Exception("timeout with args %s" % args)
Exception: timeout with args ['ntpq', '-np']
[21/Nov/2019 17:40:29 +0000] 5415 ImpalaDaemonQueryMonitoring query_monitor ERROR Error fetching completed queries from '/var/log/impalad/profiles/impala_profile_log'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 901, in _get_completed_query_profiles
self._query_monitor.get_completed_queries(query_log_file)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 581, in get_completed_queries
completed_query_report_limit)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 456, in get_completed_queries
last_accessed_file_timestamp)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 265, in _get_completed_queries
for event in streamer.stream():
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/event_streamer.py", line 134, in stream
file_list = self.__sort_file_list(self.__filtered_file_list, ascending)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/event_streamer.py", line 162, in __sort_file_list
handle = f.open_log_file()
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file.py", line 41, in open_log_file
handle = open(self.__path)
IOError: [Errno 2] No such file or directory: '/var/log/impalad/profiles/impala_profile_log_1.1-1574308887326'
[21/Nov/2019 17:40:29 +0000] 5415 MainThread agent WARNING Long HB processing time: 22.2102739811
[21/Nov/2019 17:40:29 +0000] 5415 MainThread agent WARNING Delayed HB: 7s since last
[21/Nov/2019 17:41:24 +0000] 5415 MainThread agent WARNING Long HB processing time: 25.2082297802
[21/Nov/2019 17:41:24 +0000] 5415 MainThread agent WARNING Delayed HB: 10s since last
[21/Nov/2019 17:42:23 +0000] 5415 MainThread agent WARNING Long HB processing time: 13.1406769753
[21/Nov/2019 17:43:21 +0000] 5415 MainThread agent WARNING Long HB processing time: 11.081389904
[21/Nov/2019 17:44:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 10.6936099529
[21/Nov/2019 17:45:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 9.0398440361
[21/Nov/2019 17:46:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 9.3374710083
[21/Nov/2019 17:47:18 +0000] 5415 MainThread agent WARNING Long HB processing time: 8.1221549511
[21/Nov/2019 17:47:55 +0000] 5415 MainThread heartbeat_tracker INFO HB stats (seconds): num:39 LIFE_MIN:0.00 min:0.06 mean:0.08 max:0.24 LIFE_MAX:45.16
[21/Nov/2019 17:48:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 9.35701394081
[21/Nov/2019 17:49:17 +0000] 5415 MainThread agent WARNING Long HB processing time: 7.75842308998
[21/Nov/2019 17:50:23 +0000] 5415 MainThread agent WARNING Long HB processing time: 13.1954960823
[21/Nov/2019 17:51:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 9.66306400299
[21/Nov/2019 17:52:20 +0000] 5415 MainThread agent WARNING Long HB processing time: 9.91945910454
[21/Nov/2019 17:53:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 9.45511198044
[21/Nov/2019 17:54:19 +0000] 5415 MainThread agent WARNING Long HB processing time: 9.34867501259
Thanks
Amn
Created 11-21-2019 06:51 PM
Hi Amn,
Your issue is different from which is addressed in the KB article. CM agent collects archived query profiles from Impala Daemon and saves to CM database periodically. According to your logs, the file /var/log/impalad/profiles/impala_profile_log_1.1-1574308887326 was gone when CM agent tried to open it:
[21/Nov/2019 17:39:58 +0000] 5415 Monitor-HostMonitor throttling_logger ERROR (2 skipped) Failed to collect NTP metrics
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 48, in collect
self.collect_ntpd()
...
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file.py", line 41, in open_log_file
handle = open(self.__path)
IOError: [Errno 2] No such file or directory: '/var/log/impalad/profiles/impala_profile_log_1.1-1574308887326'
I suggest you investigate why the file was gone. How often do you see this kind of error?
Created 11-25-2019 11:49 PM
Hi Robbiez,
Strangely while checking on other Daemons I see different error, although the error received from CM is the same (The health test result for IMPALAD_QUERY_MONITORING_STATUS has become bad: There are 0 error(s) seen monitoring executing queries, and 1 errors(s) seen monitoring completed queries for this role in the previous 5 minute(s). Critical threshold: any. ) whereas the logs paint a different picture, the new log I see is attached and I see the same in other daemons also although the file numbers are a bit different.
[root@Server02 impalad]$ tail -n 100 impalad.Server02.impala.log.WARNING.20191120-115803.6341
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:804)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:738)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:485)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:666)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:904)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:981)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:147)
W1126 14:44:49.208544 6755 ShortCircuitCache.java:826] ShortCircuitCache(0x116b2697): could not load 1122093192_BP-196508081-172.18.208.227-1510815452859 due to InvalidToken exception.
Java exception follows:
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to /user/hive/warehouse/MYAPPLICATION.db/network/id=4/PRT_id=31268741116146977/part-00000_copy_293
Regards
Amn
Created 11-27-2019 05:09 PM
Hi Amn,
The message was "1 errors(s) seen monitoring completed queries". It was reported by CM agent. You should be able to find error messages in detail from cloudera-scm-agent.log*. The ShortCircuitCache message was from HDFS API. You can ignore it as it's a warning message. HDFS API should retry and bypass ShortCircuitCache if ShortCircuitCache fails.
Regards,
Robbie