Support Questions

Find answers, ask questions, and share your expertise

distinct vs group by

avatar
New Contributor

Which one is optimal to use in hive ? distinct or group by ? may i know how both of the will be processed in the background ?

3 REPLIES 3

avatar
Contributor

Check the explain plan of both. I believe the distinct is re-written to a group-by by the planner.

avatar

@Ravi teja Based on my encounters, group by will be faster than distinct. Groupby is something similar to segregating the key, values which MR is capable of handling it with ease. I would say better to go with group by.

avatar
Contributor

Gunther is right, Hive planner rewrites distinct using group by, so it doesn't matter what do you use from performace point of view.