Reply
New Contributor
Posts: 2
Registered: ‎11-14-2018

COMPUTE STATS - Database scripts

Hi Team we have a new application installed on our Cloudera Cluster which can read from a database and import all tables into Hive/Impala without us writing any CREATE statements. Once the tables are created we need to perform COMPUTE STATS at Impala level and ANALYZE Table COMPUTE STATISTICS at hive level. Is there a specific way to script this? OR we need to put all COMPUTE STATS statements in one script and run it on Impala/hive command line? I know hive and impala do not have SYS tables where we can get list of each tables. Ideally we would want to schedule the COMPUTE STATS for both Hive and Impala post all our weekly loads are done. The script should be able to dynamically pick all NEW tables created in Hive/Impala across databases and perform the Compute stats. Appreciate any help. Regards JitenR

Expert Contributor
Posts: 108
Registered: ‎02-23-2018

Re: COMPUTE STATS - Database scripts

Hi @JitenR,

 

You will need run INVALIDATE METADATA weekly in your script and then recovery with SHOW TABLES all new entrys for execute COMPUTE STATS.

 

Regards, 

Manu.

New Contributor
Posts: 2
Registered: ‎11-14-2018

Re: COMPUTE STATS - Database scripts

Thank You Manu. I was able to write the script and automate the process.

 

Regards

Jitender Renwa

Champion
Posts: 777
Registered: ‎05-16-2016

Re: COMPUTE STATS - Database scripts

To  add to your his points .

 

If your table is partioned then you use the below - quoting it from Cloudera KB 

The COMPUTE INCREMENTAL STATS variation is a shortcut for partitioned tables that works on a subset of partitions rather than the entire table

Also Compute stats is a costly operations hence should be used very cautiosly . 

Some impala query may fail while performing compute stats . Hence chose Refresh command vs Compute stats accordingly . Hope this helps. Although you had done with your script :)