Support Questions
Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Missing stats in Impala

New Contributor



I need to find the tables which don't have stats computed on them.

Basically, I have many tables which are being queried upon and have slow response because of missing statistics.
I need to find the tables with missing stats, so I can run "Compute Stats" on them.



Master Collaborator

I'm afraid there is no way to get that information with a query today. You could write a script that iterates over all databases/tables ('show databases' and then 'show tables in <tbl>') and then does a 'show column stats' and 'show table stats' to see if column stats are there.


Be mindful that these 'show' commands will cause the table metadata to be loaded completely.


Doesnt MySQL have PARAM_KEY = 'numRows'
and PARAM_VALUE = '-1' 

Master Collaborator

Yes, it is possible to query the backend Metastore DB directly, but the query might turn out to be rather elaborate because you need to cover the following stats that are interesting to Impala:

1. Table level row count

2. Partition level row counts

3. Column level stats such as NDV

New Contributor

So this begs the question how does Cloudera Manager produce the following chart:


any reasson we cant use the SQL:


select query_duration from DEFAULT.IMPALA_QUERIES where serviceName="impala" and (stats_missing=true or stats_corrupt=true)