Cloudera Employee
Posts: 307
Registered: ‎10-16-2013
Re: why 'show column stats <table_name>` doesn't show statistics generated by Hive 'Analyze Ta

Ok, looks like I was wrong in assuming that Hive would compute the column stats on a table level.

For a partitioned table, Hive's ANALYZE TABLE command will compute the column stats on a per-partition basis.

It's not clear that this approach even makes sense because how will one then aggregate the different distinct-value stats across partitions? Seems like those stats would be wildly inaccurate, so maybe this is not a good flow anyway, even if we could make it work.

 

That's why the stats do not show up in Impala. The flow of computing column stats in Hive and then using them in Impala will currently not work for partitioned tables.

 

View solution in original post

Who Me Too'd this solution