Posts: 1
Topics: 1
Kudos: 0
Blog Posts: 0
Ideas: 0
Solutions: 0
Registered: ‎09-09-2018

Duplicates In Hive


How can we find Duplicate rows in Hive? 


Can write the query same way we do in SQL instead of using Distributed By at the place of Group by. 


Please suggest

Posts: 430
Registered: ‎07-01-2015

Re: Duplicates In Hive

Yes you can do it in multiple ways. For example you can use Group by or Distinct. If you want to find duplicities on the subset of the columns (i.e. find all rows where customer_id is duplicate) I would recommend to use a Group by.