Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

impala-shell operations getting stuck, spinning cpus @ 100% -- queries 'in flight' forever

avatar
Contributor

Version: Cloudera Express 5.8.2 (#17 built by jenkins on 20160916-1426 git: d23c620f3a3bbd85d8511d6ebba49beaaab14b75)

 

Parcel Name Version Status Actions
CDH 5 5.8.2-1.cdh5.8.2.p0.3 Distributed, Activated

 

$ uname -a
Linux hostname_redacted 2.6.32-642.6.2.el6.x86_64 #1 SMP Mon Oct 24 10:22:33 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

 

We initially thought we were exceeding impala-shell resources with our insert-select statement moving external csv data to an internal parquet table, however now a simple 'compute incremental stats tablename' has become stuck as well.

 

This is causing us grief in our production environment, and we are having to constantly check port 25000, and manually restart the particular impala damon spinning the cpu. Luckily our insert scripts are fault tolerant and just repeat if fail.  (but once all CPUs are consumed spinning then we are dead in the water)

 

We are not sure but this seems to have started after we upgrade 5.71. to 5.8.2.

 

In the logs immediately after the 'stuck' query is always this error:

 

I1204 03:30:03.958894 7150 Frontend.java:875] analyze query compute incremental stats tablename
I1204 03:30:03.959247 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename
I1204 03:32:03.970648 7150 Frontend.java:894] Missing tables were not received in 120000ms. Load request will be retried.
I1204 03:32:03.970940 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename
I1204 03:32:37.981461 7142 jni-util.cc:166] com.cloudera.impala.catalog.CatalogException: Detected catalog service ID change. Aborting updateCatalog()
at com.cloudera.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:120)
at com.cloudera.impala.service.Frontend.updateCatalogCache(Frontend.java:227)
at com.cloudera.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:180)
I1204 03:32:37.983515 7142 status.cc:111] CatalogException: Detected catalog service ID change. Aborting updateCatalog()
@ 0x80f2c9 (unknown)
@ 0xb37c30 (unknown)
@ 0xa4e5cf (unknown)
@ 0xa68ea9 (unknown)
@ 0xb00a02 (unknown)
@ 0xb068f3 (unknown)
@ 0xd2bed8 (unknown)
@ 0xd2b114 (unknown)
@ 0x7dc26c (unknown)
@ 0x1b208bf (unknown)
@ 0x9b0a39 (unknown)
@ 0x9b1492 (unknown)
@ 0xb89327 (unknown)
@ 0xb89c64 (unknown)
@ 0xdee99a (unknown)
@ 0x3f37a07aa1 (unknown)
@ 0x3f376e893d (unknown)
E1204 03:32:37.983541 7142 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog()

1 ACCEPTED SOLUTION

avatar
Contributor

increased catalog server heap resolved this problem.

 

however, there should be a jira opened against impala daemon.

 

if the catalog server misbehaves, impala daemon should not have queries stuck 'in flight' forever, along with consuming one cpu at 100%. (consumes an entire cpu for every stuck query)

 

View solution in original post

8 REPLIES 8

avatar
Master Collaborator

Compute stats is an expensive operation

For a table with below definition

CREATE TABLE default.test123 ( a INT, b INT ) PARTITIONED BY ( c STRING );

When you run compute stats, it spawns child queries as follows

SELECT NDV(a) AS a, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE), NDV(b) AS b, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE), NDV(c) AS c, CAST(-1 as BIGINT), MAX(length(c)), AVG(length(c)), NDV(d) AS d, CAST(-1 as BIGINT), 8, CAST(8 as DOUBLE), NDV(e) AS e, CAST(-1 as BIGINT), 8, CAST(8 as DOUBLE) FROM default.mytable

i.e) compute stats does compute the distinct values in each columns[NDV - number of distinct values], max and average length values in string columns. This on a big table is an expensive operation and it takes more time and resource


The profile of compute stats will contains the below section which will explain you the time taken for "Child queries" in nanoseconds

Start execution: 0
Planning finished: 1999998
Child queries finished: 550999506
Metastore update finished: 847999239
Rows available: 847999239

Profile Collection:
==================
a. Go to Impala > Queries
b. Identify the query you are interested in and from the dropdown on the right select "Query Details"
c. On the resulting page select "Download Profile"

1/ To understand the cpu utilisation that you highlighted here. Could you please provide the profile of the insert query.
2/ And how did you confirm that impala is causing 100% cpu utilisation? Did you run top and notice impalad process taking all the cpu?

avatar

lt looks like maybe your catalog service is having problems. It would be worth looking in the catalogd logs for clues.

avatar
Contributor

found an out of memory heap error in catalogd.

 

which is weird as we've been restarting the individual impala damon with 'in flight' queries stuck to recover from this problem, not the catalog server.

 

cloudera agent restarts it automatically?

 

heap has been increased 4x.

 

so we are in waiting mode to see if this resolves it.

 

thanks.

avatar

Yeah Cloudera Manager's agent will restart it automatically (at least in the default config I believe).

avatar
Contributor

increased catalog server heap resolved this problem.

 

however, there should be a jira opened against impala daemon.

 

if the catalog server misbehaves, impala daemon should not have queries stuck 'in flight' forever, along with consuming one cpu at 100%. (consumes an entire cpu for every stuck query)

 

avatar

Good point - we should handle this more gracefully. I filed https://issues.cloudera.org/browse/IMPALA-4629 to track the issue.

avatar

I believe we've found and fixed the root cause of the spinning thread here IMPALA-5056

avatar
Explorer

Is there a workaround for this as we are on Impala version 2.8.0. 

We are always stuck with compute incremental stats queries that need tobe manually cancelled?