Support Questions

Enigmat · ‎05-06-2021

I got a column in my Hive SQL table where values are seperated by comma (,) for each cell. Some values in this string are duplicated which I want to remove. Here is an example of my data:

data:

---------------

test, test1, test,test1

---------------

rest,rest1,rest1,rest

---------------

chest,nest,lest,gest

---------------

The result should replace any duplicates:

---------------

test,test1

---------------

rest,rest1

---------------

chest,nest,lest,gest

---------------

I want to remove duplicates. Could anyone help me with this issue?

Thank you

Shifu · ‎05-16-2021

Hello @Enigmat

Could you try DISTINCT to remove similar entries?

https://dwgeek.com/identify-and-remove-duplicate-records-from-hive-table.html/

https://stackoverflow.com/questions/43280052/how-to-delete-duplicate-records-from-hive-table

Cloudera Community

Support Questions

how to remove duplicates in a cell Hive SQL

Remove Duplicate Record values without cache servi...

Excluding Duplicate Key Columns from Hive using Re...

SQL Based authorization in hive

Hive UDFs vs Spatial SQL

Machine Learning with SQL using Apache Hive and Hi...

Get duplicate records in MySql

Using GUI SQL Tools Against Hive on HDP from MacOS...

Removing the Hive MySQL component from Ambari

My-SQL Hive User creation and grants on it

HBase Major Compactions Impact on Cell Versions Co...