We have CDH 5.15 deployed and use impala for analitic batch jobs.
During using impala we found that even a very simple impala job cost a lot of time to finish.
For example , we issue a "select count(*) from shdata.s76_bat_mg_biz_data" ,it runs about 4.8 hours.
In query detail we found that in query timeline the unregister query is 4.8h, while all other steps are very fast (in ms). How can we fix this issue to better use the system?
It's unlikely that the query is executing that long. Most likely the client you are using is delayed in closing the query.
Yes , we use hue as a query interface very often.
What we are concern about is that if the query running in hue last so long , will it occupy the concurrency we have in impala since we have admision control ?
On CDH5.15 in most cases they won't hold onto resources in admission control, unless the query isn't cancelled and the client (i.e. Hue) doesn't fetch all of the results.
Enabling the timeouts suggested by Eric helps ensure that queries get cancelled in timely manner
I've set the timeout=30, but it seems got no effect.
And in impalad /sessions, I've found the query session's idle timeout(s) is 1800.
In /varz both idle session and query timeout are 30.
It's a bit late in the game but I'm running into the same problem where the query appears to be running for hours and the first row fetched in in seconds. This means it is not actually running although the list of Impala queries says it is. As previously stated by a poster, the user did not close the session. I have just noticed that the last query executed will hold the query in a running state. Once another query is executed or the session closed, it will release the resources and mark the query as finished.
I have another post about this same issue. Neither one of these 2 parameters have helped:
So, my conclusion is the documentation is not accurate in what it says about these parameters or there's a bug as of 08/16/2019???
If you find out how to close a session on a query automatically using a parameter, let me know...
@pollardthe documentation is accurate, many people use those flags successfully. I wouldn't want to speculate about what's happening in your case. If you include a query profile that can help to diagnose.
We've seen things like this happen when there's a client polling the query for status and keeping it alive (the timeout is since the last time the client performed an operation on the query or session).
This can also happen if the query is returning a lot of rows, or if the client is very slow at fetching rows.