Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

distinct vs group by

avatar
New Contributor

Which one is optimal to use in hive ? distinct or group by ? may i know how both of the will be processed in the background ?

3 REPLIES 3

avatar
Explorer

Check the explain plan of both. I believe the distinct is re-written to a group-by by the planner.

avatar

@Ravi teja Based on my encounters, group by will be faster than distinct. Groupby is something similar to segregating the key, values which MR is capable of handling it with ease. I would say better to go with group by.

avatar
Cloudera Employee

Gunther is right, Hive planner rewrites distinct using group by, so it doesn't matter what do you use from performace point of view.

Labels