Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Duplicates In Hive

Duplicates In Hive

New Contributor


How can we find Duplicate rows in Hive? 


Can write the query same way we do in SQL instead of using Distributed By at the place of Group by. 


Please suggest


Re: Duplicates In Hive

Master Collaborator
Yes you can do it in multiple ways. For example you can use Group by or Distinct. If you want to find duplicities on the subset of the columns (i.e. find all rows where customer_id is duplicate) I would recommend to use a Group by.
Don't have an account?
Coming from Hortonworks? Activate your account here