Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Why use "COMPUTE STATS" instead of "COMPUTE INCREMENTAL STATS" if the Incremental stats size exceeds 200MB?

avatar
Explorer

Updating statistics using "COMPUTE INCREMENTAL STATS" produced the following error.

 

 

Server version: impalad version 3.2.0-cdh6.3.2 RELEASE (build 1bb9836227301b839a32c6bc230e35439d5984ac)
Query: COMPUTE INCREMENTAL STATS hoge_db.huga_tbl PARTITION ( dt >= "20221015" )
ERROR: AnalysisException: Incremental stats size estimate exceeds 200.00MB. Please try COMPUTE STATS instead.

 

 

In the error message, it says "Please try COMPUTE STATS instead.".
I don't understand why. Wouldn't "COMPUTE STATS" give the same result?

4 REPLIES 4

avatar
Super Collaborator

Hi @yassan ,

 

I would like to let you know that, the default value on the flag(inc_stats_size_limit_bytes) is set to 200 MB, as a safety check to prevent Impala from hitting the maximum limit for the table metadata.

 

Whereas, the error reported usually serves as an indication that 'COMPUTE INCREMENTAL STATS' should not be used on the particular table and consider spitting the table thereby, using regular 'COMPUTE STATS'  statement if possible. 

 

However, incase if you are not able to use the 'Compute Stats' statement then you could try to increase the default limit on the flag(inc_stats_size_limit_bytes) where, it should be set less than 1 GB limit and the value is measured in bytes.

 

Below is the seteps:

 

1. CM > Impala Service > Configuration > Search "Impala Command Line Argument Advanced Configuration Snippet (Safety Valve)"

2. Add --inc_stats_size_limit_bytes=    #####Please note that the above value is in bytes. 

For example, if you want to set 400 Mb, please input 419430400(400*1024*1024).

3. Please save and restart Impala service.

 

Note: If I answered your question please give a thumbs up and Accept it as a solution.

 

Regards,

Chethan YM

 

avatar
Explorer

Hi @ChethanYM 

I’m sorry for the late reply.

Thank you for the helpful information.

COMPUTE STATS Statement | 6.3.x | Cloudera Documentation
https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_compute_stats.html#compute_st...

> If this metadata for all tables exceeds 2 GB, you might experience service downtime. In Impala 3.1 and higher, the issue was alleviated with an improved handling of incremental stats.


As stated above, Impala service down (or just 1 Impalad down?) is my concern.
Are there any metrics that would check the inc_stats_size?

 

Also, what should we do if inc_stats_size_limit_bytes is insufficient even if it is 1GB?
We assume that the number of columns and partitions is too large and therefore insufficient.
In that case, how should we take countermeasures?

 

Regards,
yassan

avatar
Super Collaborator

Hi,

 

> As per the document it is service down time, So i think it is complete impala service down time. (However I haven't seen the issue on live)

> No metrics/graph to check " inc_stats_size"

> If 1GB is insufficient, Try to use "compute stats" instead of  "compute incremental stats"

 

Regards,

Chethan YM

avatar
Explorer

> No metrics/graph to check " inc_stats_size"

 

That's what I thought.


> If 1GB is insufficient, Try to use "compute stats" instead of "compute incremental stats"

 

However, there is a problem in that case.

 

This table is updated every hour adding a new partition.
But, the "compute stats" take well over an hour to complete.