Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive: row_number() not vectorised

Hive: row_number() not vectorised

Expert Contributor

I have a merge statement and was looking at how to make it faster. Inside the using part of the statement, there is a row_number() function to do some deduplication.

 

In the logs I see:

INFO physical.Vectorizer (:()) - Reduce vectorized: false
INFO physical.Vectorizer (:()) - Reduce notVectorizedReason: PTF operator: ROW_NUMBER not in supported functions [avg, count, dense_rank, first_value, last_value, max, min, rank, row_number, sum]

 

This log statement does not seem right: ROW_NUMBER not in [row_number] ?

 

I tried for the sake of my peace of mind with uppercase and lowercase row_number, but without any difference, luckily.

 

Is there anything I could do to get vectorisation and row_number together? 

This is with hive 3.1.0 from HDP 3.1.4.

 

Don't have an account?
Coming from Hortonworks? Activate your account here