Posts: 13
Registered: ‎09-29-2016

Impala query does not respond,seems stuck or going in loop.

i run a particular query on daily basis to generate feed file from query is complex with lots of joins with 10-12 tables. but  it behaves weired and have done everything to resolve but cant get any clue or solution.

the below is from impala browser, i ran for the date 04-04-2017


Query Type Start Time End Time Duration Scan Progress State # rows fetched Details

QUERY2017-05-09 13:04:36.0535360002017-05-09 13:05:22.66984200046s616ms115 / 115 ( 100%)FINISHED48731


the below i ran for 05-05-2017,it returns the record till 30720 and after that it stucks and when i see memory by clicking on details memory keeps on growing  but existing progress  105/115(91.30435) never last event it says finished but query still remians in flight.


Query Type Start Time Duration Scan Progress State Last Event # rows fetched Details


UERY2017-05-09 13:10:38.93422600025m3s105 / 115 (91.3043%)FINISHEDFirst row fetched30720



this is again an issue in PROD environement. we are facing this issue since 05-05-2017. and no solution yet. a very embarissing situation for us.

Kindly help to resolve it.


i am using :  impalad version 2.7.0-cdh5.9.0

Expert Contributor
Posts: 181
Registered: ‎01-25-2017

Re: Impala query does not respond,seems stuck or going in loop.

@hrishi1dypim Are you run it from impa-shell or Hue?

Cloudera Employee
Posts: 186
Registered: ‎07-29-2015

Re: Impala query does not respond,seems stuck or going in loop.

I would suggest looking at the execution summary or profile to understand where the time is going. The progress only measures progress of the table scans, so this is consistent with the time is being spent in joins (or other operations) after the table scans.


You probably just have a very large join. Could be that the join order in the query plan is not optimal or maybe you're just running the query on too much data for the cluster size.


Usually the troubleshooting steps are something like:

  1. Look at the execution summary to get a high-level idea of where time is going and how many rows are flowing through the plan. It may be obvious - e.g. if a join with a lot of input rows is using a lot of time.
  2. If the problem is just the number of input rows, check if anything looks fishy with the plan (e.g., missing stats, joins where the right hand input is much larger than the left hand input, "exploding" joins that produce many more output rows than input rows).
  3. If you can't tell why things are slow from the execution summary, look at the query profile to drill down into where time went (this can be hard to interpret sometimes)