Created on 02-06-2018 09:31 AM - edited 09-16-2022 05:49 AM
Hi there,
Mi issue is very basic, I'm looking for the equivalent function in Hive of this SQL function =>
COLLATE SQL_Latin1_General_CP1_CI_AI
It's very important for me because I work on towns like Sao Paulo which have different writing versions. I have an old topic regarding this subject on stackoverflow, which unfortunately didn't get any interesting answer....
Could you provide me help please ?
Best regards,
Nicolas.
Created 02-11-2018 06:08 PM
You may set the character set used by Hive, for a given table, with the Table SerDe property "serialization.encoding". Take a look at the follow JIRA for an example on how to use it:
https://issues.apache.org/jira/browse/HIVE-12653
If you would like to use the MultiDelimitSerDe class, referenced in HIVE-12653, this serialization feature is available starting in CDH 5.10.
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_510.html
The valid Character Sets are discussed in the following link. In particular, take a look at the "Standard charsets."
https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html
Created 06-09-2022 01:19 AM
HI Nicolas,
Did you get any resolution for this ?
I also have a similar requirement where I need to do case-insensitive joins across the system and I don't wish to apply upper/lower functions.
I tried setting TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci') at Table level but still the comparison is happening considering case sensitivity. PSB -
drop table test.caseI;
create table test.caseI
(name string, id int)
TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci');
insert into test.caseI values ('hj',1);
drop table test.caseI_2;
create table test.caseI_2
(name string, id int)
TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci');
insert into test.caseI_2 values ('HJ',1);
select * from test.caseI i
inner join test.caseI_2 i2 on i.name=i2.name;
--No Result
Tried with encoding 'SQL_Latin1_General_CP1_CI_AI' but got same result as above.
Any help would be appreciated, thanks!