Created 03-15-2017 02:17 PM
Hi,
I am trying to set custom properties (using "ALTER TABLE ... SET TBLPROPERTIES ..." command) that contains special characters, like 'ç' or 'é'. The problem is I am getting '\u00e7' and '\u00e9' as a result when I execute a "DESCRIBE TABLE ... FORMATTED" command.
Is there a way to get the proper encoding in return ?
Here is my command :
ALTER TABLE mydatabase.mytable SET TBLPROPERTIES ('Test'='François');
Here is what I am getting as a result with a DESCRIBE TABLE ... FORMATTED command :
Test Fran\\u00e7ois
Thanks in advance for your answer !
Sylvain.
Created 03-15-2017 06:05 PM
Your question is how to store and retrieve encoded characters in French from table data definition, specifically table properties. Hive expects UTF-8 by default in data definition and even data store. I am not aware of the option to use that approach for data definition. Regarding data store you can encode/decode using a special SerDe as specified above by @Boris Demerov.
Created 03-15-2017 05:44 PM
Custom SerDes are always a last resort. Hive expects UTF-8 data. If the encoding is, say, ISO/IEC 8859-1, you will need to either convert the data, however starting with Hive 0.14 you can use the feature added in https://issues.apache.org/jira/browse/HIVE-7142. I believe that for French is FR. See below an example for GBK
CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES("serialization.encoding"='GBK');
Created 03-15-2017 06:05 PM
Your question is how to store and retrieve encoded characters in French from table data definition, specifically table properties. Hive expects UTF-8 by default in data definition and even data store. I am not aware of the option to use that approach for data definition. Regarding data store you can encode/decode using a special SerDe as specified above by @Boris Demerov.
Created 03-16-2017 04:36 PM
Thank you, it seems like TBLPROPERTIES don't like french characters ...