Support Questions

Find answers, ask questions, and share your expertise

Duplicates In Hive

avatar
New Contributor

 

How can we find Duplicate rows in Hive? 

 

Can write the query same way we do in SQL instead of using Distributed By at the place of Group by. 

 

Please suggest

1 REPLY 1

avatar
Yes you can do it in multiple ways. For example you can use Group by or Distinct. If you want to find duplicities on the subset of the columns (i.e. find all rows where customer_id is duplicate) I would recommend to use a Group by.