Support Questions
Find answers, ask questions, and share your expertise

Compute stats taking lot of time to execute

Compute stats taking lot of time to execute

Contributor

Hi,

 

We are using CDH 5.5.1, we have some incrmental data ingestion happening on hourly basis. After incrmental ingestion is done, we usually run compute stats command on our tables, to optimize queries for front end. But we are noticing it is taking more than hour to complete compute stats on 1 table which has data approx.....of 500 GB.

 

Is there anyway we can make compute stats or compute incrmental stats run faster or quicker ?

 

Version for Impala:

2.3.0+cdh5.5.1+0

 

Please help!!!

1 REPLY 1

Re: Compute stats taking lot of time to execute

Master Collaborator

Hi Jais,

 

I recommend that you consider updating the column stats (#distinct values per column), and the table stats (table/partition row counts) separately. Compute stats is mostly expensive due to computing column stats, but the number of distinct values typically changes much slower than the row count.

 

What you can do is run the full compute stats less frequently (e.g., once your table size has doubled).

You can update the table stats (row counts) in a much cheaper way by running select count(*) and using ALTER TABLE to manually set the new row count.

 

The pocedure is described in more detail here:

http://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_perf_stats.html#perf_stats

 

Alex