Created 12-13-2017 11:40 PM
Which one is optimal to use in hive ? distinct or group by ? may i know how both of the will be processed in the background ?
Created 12-14-2017 10:56 PM
Check the explain plan of both. I believe the distinct is re-written to a group-by by the planner.
Created 12-15-2017 05:06 AM
@Ravi teja Based on my encounters, group by will be faster than distinct. Groupby is something similar to segregating the key, values which MR is capable of handling it with ease. I would say better to go with group by.
Created 12-15-2017 05:57 AM
Gunther is right, Hive planner rewrites distinct using group by, so it doesn't matter what do you use from performace point of view.