Support Questions
Find answers, ask questions, and share your expertise

Who agreed with this solution

Re: why 'show column stats <table_name>` doesn't show statistics generated by Hive 'Analyze Ta

Master Collaborator

Ok, looks like I was wrong in assuming that Hive would compute the column stats on a table level.

For a partitioned table, Hive's ANALYZE TABLE command will compute the column stats on a per-partition basis.

It's not clear that this approach even makes sense because how will one then aggregate the different distinct-value stats across partitions? Seems like those stats would be wildly inaccurate, so maybe this is not a good flow anyway, even if we could make it work.


That's why the stats do not show up in Impala. The flow of computing column stats in Hive and then using them in Impala will currently not work for partitioned tables.


View solution in original post

Who agreed with this solution