Support Questions

Find answers, ask questions, and share your expertise

impala running jobs does not finish in time

avatar
Explorer

We have CDH 5.15 deployed and use impala for analitic batch jobs.

During using impala we found that even a very simple impala job cost a lot of time to finish.

For example , we issue a  "select count(*) from shdata.s76_bat_mg_biz_data" ,it runs about 4.8 hours. 

In query detail we found that in query timeline the unregister query is 4.8h, while all other steps are very fast (in ms). How can we fix this issue to better use the system?

1 ACCEPTED SOLUTION

avatar
Super Guru
You can decrease the idle session/query timeout so that queries can be cancelled from server side.

Please refer to doc here:
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_timeouts.html

Cheers

View solution in original post

15 REPLIES 15

avatar

It's unlikely that the query is executing that long. Most likely the client you are using is delayed in closing the query.

avatar
Super Guru
Did you use Hue to query Impala? Did you actually experience query took 4.8 hours to return the result, or did you just noticed the start and end time from query profile that is long?

Hue does not close query handler after query finishes so that users can still retrieve result later on. This time while query handler is open until closed will be counted towards the query start and end time in the profile.

avatar
Explorer

Yes , we use hue as a query interface very often. 

What we are concern about is that if the query running in hue last so long , will it occupy the concurrency we have in impala since we have admision control ?

avatar
Super Guru
You can decrease the idle session/query timeout so that queries can be cancelled from server side.

Please refer to doc here:
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_timeouts.html

Cheers

avatar

On CDH5.15 in most cases they won't hold onto resources in admission control, unless the query isn't cancelled and the client (i.e. Hue) doesn't fetch all of the results.

 

Enabling the timeouts suggested by Eric helps ensure that queries get cancelled in timely manner

avatar
Explorer

I've set the timeout=30, but it seems got no effect.

And in impalad /sessions,  I've found the query session's idle timeout(s) is 1800.

In /varz both idle session and query timeout are 30.

avatar
Rising Star

It's a bit late in the game but I'm running into the same problem where the query appears to be running for hours and the first row fetched in in seconds.  This means it is not actually running although the list of Impala queries says it is.  As previously stated by a poster, the user did not close the session.  I have just noticed that the last query executed will hold the query in a running state.  Once another query is executed or the session closed, it will release the resources and mark the query as finished.

I have another post about this same issue.  Neither one of these 2 parameters have helped:

-idle_session_timeout=1500
-idle_query_timeout=1500

 

So, my conclusion is the documentation is not accurate in what it says about these parameters or there's a bug as of 08/16/2019???

 

If you find out how to close a session on a query automatically using a parameter, let me know...

avatar

@pollardthe documentation is accurate, many people use those flags successfully. I wouldn't want to speculate about what's happening in your case. If you include a query profile that can help to diagnose.


We've seen things like this happen when there's a client polling the query for status and keeping it alive (the timeout is since the last time the client performed an operation on the query or session).

avatar

This can also happen if the query is returning a lot of rows, or if the client is very slow at fetching rows.