Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Who agreed with this solution

avatar

Ok, looks like I was wrong in assuming that Hive would compute the column stats on a table level.

For a partitioned table, Hive's ANALYZE TABLE command will compute the column stats on a per-partition basis.

It's not clear that this approach even makes sense because how will one then aggregate the different distinct-value stats across partitions? Seems like those stats would be wildly inaccurate, so maybe this is not a good flow anyway, even if we could make it work.

 

That's why the stats do not show up in Impala. The flow of computing column stats in Hive and then using them in Impala will currently not work for partitioned tables.

 

View solution in original post

Who agreed with this solution