Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

What is the term 'Vectorization' used while update/delete operations in Hive?

avatar
Expert Contributor
 
1 ACCEPTED SOLUTION

avatar
Master Guru

Can you paste the actual message? Normally Vectorization is Hive grouping together x ( 1024 ) records to run operations at them at once. This is much more efficient than doing operations row by row on modern CPUs because of cache settings and compiler optimizations.

https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution

Not sure about Update Deletes, they might use Vectorization for some functions there too.

View solution in original post

3 REPLIES 3

avatar
Master Guru

Can you paste the actual message? Normally Vectorization is Hive grouping together x ( 1024 ) records to run operations at them at once. This is much more efficient than doing operations row by row on modern CPUs because of cache settings and compiler optimizations.

https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution

Not sure about Update Deletes, they might use Vectorization for some functions there too.

avatar
Expert Contributor

I wanted to understand the concept for vectorization.I got it from above link.Thanks

avatar
Master Guru

I agree with @Benjamin Leonhardi. Provide the log file because vectorization occurs during opertaions like scans, filters, aggregates, and joins.