Created on 12-05-2016 09:49 AM - edited 09-16-2022 03:50 AM
Version: Cloudera Express 5.8.2 (#17 built by jenkins on 20160916-1426 git: d23c620f3a3bbd85d8511d6ebba49beaaab14b75)
Parcel Name Version Status Actions
CDH 5 5.8.2-1.cdh5.8.2.p0.3 Distributed, Activated
$ uname -a
Linux hostname_redacted 2.6.32-642.6.2.el6.x86_64 #1 SMP Mon Oct 24 10:22:33 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
We initially thought we were exceeding impala-shell resources with our insert-select statement moving external csv data to an internal parquet table, however now a simple 'compute incremental stats tablename' has become stuck as well.
This is causing us grief in our production environment, and we are having to constantly check port 25000, and manually restart the particular impala damon spinning the cpu. Luckily our insert scripts are fault tolerant and just repeat if fail. (but once all CPUs are consumed spinning then we are dead in the water)
We are not sure but this seems to have started after we upgrade 5.71. to 5.8.2.
In the logs immediately after the 'stuck' query is always this error:
I1204 03:30:03.958894 7150 Frontend.java:875] analyze query compute incremental stats tablename
I1204 03:30:03.959247 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename
I1204 03:32:03.970648 7150 Frontend.java:894] Missing tables were not received in 120000ms. Load request will be retried.
I1204 03:32:03.970940 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename
I1204 03:32:37.981461 7142 jni-util.cc:166] com.cloudera.impala.catalog.CatalogException: Detected catalog service ID change. Aborting updateCatalog()
at com.cloudera.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:120)
at com.cloudera.impala.service.Frontend.updateCatalogCache(Frontend.java:227)
at com.cloudera.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:180)
I1204 03:32:37.983515 7142 status.cc:111] CatalogException: Detected catalog service ID change. Aborting updateCatalog()
@ 0x80f2c9 (unknown)
@ 0xb37c30 (unknown)
@ 0xa4e5cf (unknown)
@ 0xa68ea9 (unknown)
@ 0xb00a02 (unknown)
@ 0xb068f3 (unknown)
@ 0xd2bed8 (unknown)
@ 0xd2b114 (unknown)
@ 0x7dc26c (unknown)
@ 0x1b208bf (unknown)
@ 0x9b0a39 (unknown)
@ 0x9b1492 (unknown)
@ 0xb89327 (unknown)
@ 0xb89c64 (unknown)
@ 0xdee99a (unknown)
@ 0x3f37a07aa1 (unknown)
@ 0x3f376e893d (unknown)
E1204 03:32:37.983541 7142 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog()
Created 12-08-2016 07:14 AM
increased catalog server heap resolved this problem.
however, there should be a jira opened against impala daemon.
if the catalog server misbehaves, impala daemon should not have queries stuck 'in flight' forever, along with consuming one cpu at 100%. (consumes an entire cpu for every stuck query)
Created 12-05-2016 10:46 AM
Compute stats is an expensive operation
For a table with below definition
CREATE TABLE default.test123 ( a INT, b INT ) PARTITIONED BY ( c STRING );
When you run compute stats, it spawns child queries as follows
SELECT NDV(a) AS a, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE), NDV(b) AS b, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE), NDV(c) AS c, CAST(-1 as BIGINT), MAX(length(c)), AVG(length(c)), NDV(d) AS d, CAST(-1 as BIGINT), 8, CAST(8 as DOUBLE), NDV(e) AS e, CAST(-1 as BIGINT), 8, CAST(8 as DOUBLE) FROM default.mytable
i.e) compute stats does compute the distinct values in each columns[NDV - number of distinct values], max and average length values in string columns. This on a big table is an expensive operation and it takes more time and resource
The profile of compute stats will contains the below section which will explain you the time taken for "Child queries" in nanoseconds
Start execution: 0
Planning finished: 1999998
Child queries finished: 550999506
Metastore update finished: 847999239
Rows available: 847999239
Profile Collection:
==================
a. Go to Impala > Queries
b. Identify the query you are interested in and from the dropdown on the right select "Query Details"
c. On the resulting page select "Download Profile"
1/ To understand the cpu utilisation that you highlighted here. Could you please provide the profile of the insert query.
2/ And how did you confirm that impala is causing 100% cpu utilisation? Did you run top and notice impalad process taking all the cpu?
Created 12-05-2016 11:05 AM
lt looks like maybe your catalog service is having problems. It would be worth looking in the catalogd logs for clues.
Created 12-05-2016 12:22 PM
found an out of memory heap error in catalogd.
which is weird as we've been restarting the individual impala damon with 'in flight' queries stuck to recover from this problem, not the catalog server.
cloudera agent restarts it automatically?
heap has been increased 4x.
so we are in waiting mode to see if this resolves it.
thanks.
Created 12-05-2016 03:02 PM
Yeah Cloudera Manager's agent will restart it automatically (at least in the default config I believe).
Created 12-08-2016 07:14 AM
increased catalog server heap resolved this problem.
however, there should be a jira opened against impala daemon.
if the catalog server misbehaves, impala daemon should not have queries stuck 'in flight' forever, along with consuming one cpu at 100%. (consumes an entire cpu for every stuck query)
Created 12-08-2016 08:34 AM
Good point - we should handle this more gracefully. I filed https://issues.cloudera.org/browse/IMPALA-4629 to track the issue.
Created 06-12-2017 09:25 PM
I believe we've found and fixed the root cause of the spinning thread here IMPALA-5056
Created 11-19-2018 12:31 AM
Is there a workaround for this as we are on Impala version 2.8.0.
We are always stuck with compute incremental stats queries that need tobe manually cancelled?