I need to find the tables which don't have stats computed on them.
Basically, I have many tables which are being queried upon and have slow response because of missing statistics.
I need to find the tables with missing stats, so I can run "Compute Stats" on them.
I'm afraid there is no way to get that information with a query today. You could write a script that iterates over all databases/tables ('show databases' and then 'show tables in <tbl>') and then does a 'show column stats' and 'show table stats' to see if column stats are there.
Be mindful that these 'show' commands will cause the table metadata to be loaded completely.
Yes, it is possible to query the backend Metastore DB directly, but the query might turn out to be rather elaborate because you need to cover the following stats that are interesting to Impala:
1. Table level row count
2. Partition level row counts
3. Column level stats such as NDV
So this begs the question how does Cloudera Manager produce the following chart:
any reasson we cant use the SQL:
select query_duration from DEFAULT.IMPALA_QUERIES where serviceName="impala" and (stats_missing=true or stats_corrupt=true)