Support Questions
Find answers, ask questions, and share your expertise

Hive: row_number() not vectorised

Hive: row_number() not vectorised

Expert Contributor

I have a merge statement and was looking at how to make it faster. Inside the using part of the statement, there is a row_number() function to do some deduplication.


In the logs I see:

INFO physical.Vectorizer (:()) - Reduce vectorized: false
INFO physical.Vectorizer (:()) - Reduce notVectorizedReason: PTF operator: ROW_NUMBER not in supported functions [avg, count, dense_rank, first_value, last_value, max, min, rank, row_number, sum]


This log statement does not seem right: ROW_NUMBER not in [row_number] ?


I tried for the sake of my peace of mind with uppercase and lowercase row_number, but without any difference, luckily.


Is there anything I could do to get vectorisation and row_number together? 

This is with hive 3.1.0 from HDP 3.1.4.