- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to remove duplicates in a cell Hive SQL
- Labels:
-
Apache Hive
-
Apache Impala
Created on ‎05-06-2021 02:01 AM - edited ‎05-06-2021 02:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I got a column in my Hive SQL table where values are seperated by comma (,) for each cell. Some values in this string are duplicated which I want to remove. Here is an example of my data:
data:
---------------
test, test1, test,test1
---------------
rest,rest1,rest1,rest
---------------
chest,nest,lest,gest
---------------
The result should replace any duplicates:
---------------
test,test1
---------------
rest,rest1
---------------
chest,nest,lest,gest
---------------
I want to remove duplicates. Could anyone help me with this issue?
Thank you
Created ‎05-16-2021 10:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Enigmat
Could you try DISTINCT to remove similar entries?
https://dwgeek.com/identify-and-remove-duplicate-records-from-hive-table.html/
https://stackoverflow.com/questions/43280052/how-to-delete-duplicate-records-from-hive-table
