- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
impala running jobs does not finish in time
- Labels:
-
Apache Impala
Created on 12-27-2018 06:00 PM - edited 09-16-2022 07:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have CDH 5.15 deployed and use impala for analitic batch jobs.
During using impala we found that even a very simple impala job cost a lot of time to finish.
For example , we issue a "select count(*) from shdata.s76_bat_mg_biz_data" ,it runs about 4.8 hours.
In query detail we found that in query timeline the unregister query is 4.8h, while all other steps are very fast (in ms). How can we fix this issue to better use the system?
Created 12-29-2018 09:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please refer to doc here:
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_timeouts.html
Cheers
Created 12-28-2018 07:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's unlikely that the query is executing that long. Most likely the client you are using is delayed in closing the query.
Created 12-28-2018 11:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hue does not close query handler after query finishes so that users can still retrieve result later on. This time while query handler is open until closed will be counted towards the query start and end time in the profile.
Created 12-29-2018 12:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes , we use hue as a query interface very often.
What we are concern about is that if the query running in hue last so long , will it occupy the concurrency we have in impala since we have admision control ?
Created 12-29-2018 09:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please refer to doc here:
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_timeouts.html
Cheers
Created 12-31-2018 01:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On CDH5.15 in most cases they won't hold onto resources in admission control, unless the query isn't cancelled and the client (i.e. Hue) doesn't fetch all of the results.
Enabling the timeouts suggested by Eric helps ensure that queries get cancelled in timely manner
Created 01-24-2019 10:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've set the timeout=30, but it seems got no effect.
And in impalad /sessions, I've found the query session's idle timeout(s) is 1800.
In /varz both idle session and query timeout are 30.
Created 08-16-2019 07:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's a bit late in the game but I'm running into the same problem where the query appears to be running for hours and the first row fetched in in seconds. This means it is not actually running although the list of Impala queries says it is. As previously stated by a poster, the user did not close the session. I have just noticed that the last query executed will hold the query in a running state. Once another query is executed or the session closed, it will release the resources and mark the query as finished.
I have another post about this same issue. Neither one of these 2 parameters have helped:
-idle_session_timeout=1500
-idle_query_timeout=1500
So, my conclusion is the documentation is not accurate in what it says about these parameters or there's a bug as of 08/16/2019???
If you find out how to close a session on a query automatically using a parameter, let me know...
Created 08-16-2019 10:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@pollardthe documentation is accurate, many people use those flags successfully. I wouldn't want to speculate about what's happening in your case. If you include a query profile that can help to diagnose.
We've seen things like this happen when there's a client polling the query for status and keeping it alive (the timeout is since the last time the client performed an operation on the query or session).
Created 08-16-2019 10:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This can also happen if the query is returning a lot of rows, or if the client is very slow at fetching rows.
