Reply
Highlighted
Explorer
Posts: 7
Registered: ‎12-27-2018
Accepted Solution

impala running jobs does not finish in time

We have CDH 5.15 deployed and use impala for analitic batch jobs.

During using impala we found that even a very simple impala job cost a lot of time to finish.

For example , we issue a  "select count(*) from shdata.s76_bat_mg_biz_data" ,it runs about 4.8 hours. 

In query detail we found that in query timeline the unregister query is 4.8h, while all other steps are very fast (in ms). How can we fix this issue to better use the system?

Cloudera Employee
Posts: 399
Registered: ‎07-29-2015

Re: impala running jobs does not finish in time

It's unlikely that the query is executing that long. Most likely the client you are using is delayed in closing the query.

Cloudera Employee
Posts: 591
Registered: ‎03-23-2015

Re: impala running jobs does not finish in time

Did you use Hue to query Impala? Did you actually experience query took 4.8 hours to return the result, or did you just noticed the start and end time from query profile that is long?

Hue does not close query handler after query finishes so that users can still retrieve result later on. This time while query handler is open until closed will be counted towards the query start and end time in the profile.
Explorer
Posts: 7
Registered: ‎12-27-2018

Re: impala running jobs does not finish in time

Yes , we use hue as a query interface very often. 

What we are concern about is that if the query running in hue last so long , will it occupy the concurrency we have in impala since we have admision control ?

Cloudera Employee
Posts: 591
Registered: ‎03-23-2015

Re: impala running jobs does not finish in time

You can decrease the idle session/query timeout so that queries can be cancelled from server side.

Please refer to doc here:
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_timeouts.html

Cheers
Cloudera Employee
Posts: 399
Registered: ‎07-29-2015

Re: impala running jobs does not finish in time

On CDH5.15 in most cases they won't hold onto resources in admission control, unless the query isn't cancelled and the client (i.e. Hue) doesn't fetch all of the results.

 

Enabling the timeouts suggested by Eric helps ensure that queries get cancelled in timely manner

Explorer
Posts: 7
Registered: ‎12-27-2018

Re: impala running jobs does not finish in time

I've set the timeout=30, but it seems got no effect.

And in impalad /sessions,  I've found the query session's idle timeout(s) is 1800.

In /varz both idle session and query timeout are 30.

Announcements