Hi Team we have a new application installed on our Cloudera Cluster which can read from a database and import all tables into Hive/Impala without us writing any CREATE statements. Once the tables are created we need to perform COMPUTE STATS at Impala level and ANALYZE Table COMPUTE STATISTICS at hive level. Is there a specific way to script this? OR we need to put all COMPUTE STATS statements in one script and run it on Impala/hive command line? I know hive and impala do not have SYS tables where we can get list of each tables. Ideally we would want to schedule the COMPUTE STATS for both Hive and Impala post all our weekly loads are done. The script should be able to dynamically pick all NEW tables created in Hive/Impala across databases and perform the Compute stats. Appreciate any help. Regards JitenR
To add to your his points .
If your table is partioned then you use the below - quoting it from Cloudera KB
The COMPUTE INCREMENTAL STATS variation is a shortcut for partitioned tables that works on a subset of partitions rather than the entire table
Also Compute stats is a costly operations hence should be used very cautiosly .
Some impala query may fail while performing compute stats . Hence chose Refresh command vs Compute stats accordingly . Hope this helps. Although you had done with your script :)