- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Case-insensitive comparison Hive
Created on
‎06-09-2022
01:22 AM
- last edited on
‎06-09-2022
07:11 PM
by
VidyaSargur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI ,
I have a requirement where I need to do case-insensitive joins across the system and I don't wish to apply upper/lower functions.
I tried setting TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci') at Table level but still the comparison is happening considering case sensitivity. PSB -
drop table test.caseI;
create table test.caseI
(name string, id int)
TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci');
insert into test.caseI values ('hj',1);
drop table test.caseI_2;
create table test.caseI_2
(name string, id int)
TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci');
insert into test.caseI_2 values ('HJ',1);
select * from test.caseI i
inner join test.caseI_2 i2 on i.name=i2.name;
--No Result
Tried with encoding 'SQL_Latin1_General_CP1_CI_AI' but got same result as above.
Any help would be appreciated, thanks!
Created ‎06-09-2022 03:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DataMan-HJ , the requirement you're looking for with case-insensitive joins doesn't seem to be present in Hive and likely will not be implemented as Hive relies on Java's UTF-8 strings and the behavior which implicitly comes with it - without possibility to change the collation.
There's a good discussion on HIVE-4070 where a similar ask is raised for the LIKE operator behavior. You can review the pros and cons there. So you will likely need to go ahead to change the individual joins to use the lower/upper functions.
Best regards
Miklos
Created ‎06-09-2022 03:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DataMan-HJ , the requirement you're looking for with case-insensitive joins doesn't seem to be present in Hive and likely will not be implemented as Hive relies on Java's UTF-8 strings and the behavior which implicitly comes with it - without possibility to change the collation.
There's a good discussion on HIVE-4070 where a similar ask is raised for the LIKE operator behavior. You can review the pros and cons there. So you will likely need to go ahead to change the individual joins to use the lower/upper functions.
Best regards
Miklos
Created ‎06-13-2022 12:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @mszurap for your quick response. 🙂
