Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this solution

avatar

Ok, looks like I was wrong in assuming that Hive would compute the column stats on a table level.

For a partitioned table, Hive's ANALYZE TABLE command will compute the column stats on a per-partition basis.

It's not clear that this approach even makes sense because how will one then aggregate the different distinct-value stats across partitions? Seems like those stats would be wildly inaccurate, so maybe this is not a good flow anyway, even if we could make it work.

 

That's why the stats do not show up in Impala. The flow of computing column stats in Hive and then using them in Impala will currently not work for partitioned tables.

 

View solution in original post

Who agreed with this solution