Support Questions

Find answers, ask questions, and share your expertise

Case-insensitive comparison Hive

avatar
New Contributor

HI ,


I have a requirement where I need to do case-insensitive joins across the system and I don't wish to apply upper/lower functions. 
I tried setting TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci') at Table level but still the comparison is happening considering case sensitivity.  PSB -

drop table test.caseI;
create table test.caseI
(name string, id int)
TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci');
insert into test.caseI values ('hj',1);

drop table test.caseI_2;
create table test.caseI_2
(name string, id int)
TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci');
insert into test.caseI_2 values ('HJ',1);


select * from test.caseI i
inner join test.caseI_2 i2 on i.name=i2.name;
--No Result
Tried with encoding 'SQL_Latin1_General_CP1_CI_AI' but got same result as above.

Any help would be appreciated, thanks!

1 ACCEPTED SOLUTION

avatar

Hi @DataMan-HJ , the requirement you're looking for with case-insensitive joins doesn't seem to be present in Hive and likely will not be implemented as Hive relies on Java's UTF-8 strings and the behavior which implicitly comes with it - without possibility to change the collation.

There's a good discussion on HIVE-4070 where a similar ask is raised for the LIKE operator behavior. You can review the pros and cons there. So you will likely need to go ahead to change the individual joins to use the lower/upper functions.

 

Best regards

 Miklos

 

View solution in original post

2 REPLIES 2

avatar

Hi @DataMan-HJ , the requirement you're looking for with case-insensitive joins doesn't seem to be present in Hive and likely will not be implemented as Hive relies on Java's UTF-8 strings and the behavior which implicitly comes with it - without possibility to change the collation.

There's a good discussion on HIVE-4070 where a similar ask is raised for the LIKE operator behavior. You can review the pros and cons there. So you will likely need to go ahead to change the individual joins to use the lower/upper functions.

 

Best regards

 Miklos

 

avatar
New Contributor

Thanks @mszurap for your quick response. 🙂