Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this solution

avatar

It looks like most likely a bug in the Impala planner with some of the estimated stats being calculated as negative and it triggering that error. I haven't seen exactly this symptom before, but I think it's most likely caused by https://issues.apache.org/jira/browse/IMPALA-7604.

 

This could cause it if the #rows estimate from the aggregation (GROUP BY) before the sort (ORDER BY) overflows and wraps around to being negative. For this to happen, the product of the distinct value count from the columns in the group by would have to be > 2^63. I.e. if you have GROUP BY a, b, c and each column has 10 distinct values, you would get a product of 10 * 10 * 10 = 1000.

It's possible that the stats changed somehow and before it wasn't being triggered.

CDH6.3.4 is the earliest release with a fix for it. You could work around by dropping stats on one of the tables, but that can have some pretty bad performance implications. You might also be able to work around by tweaking the query, e.g. grouping by different columns.

View solution in original post

Who agreed with this solution