Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can i handle the spanish characters in hive table? Ex : some columns in CSV file have the values like ó, ñ. While load this values to hive,It stored like box �. Please help me on this, i want to create hive table and load this values properly.

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar
Rising Star

Thanks bpreachuk,

'field.delim' = '|' also not helping me, But somehow we have fixed the issue with below CSV serde properties :

WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"", "escapeChar" = "\\", "serialization.encoding"='ISO-8859-1') LOCATION '/path/' TBLPROPERTIES ( 'store.charset'='ISO-8859-1', 'retrieve.charset'='ISO-8859-1', 'skip.header.line.count'='1');

View solution in original post

6 REPLIES 6

avatar

avatar
Rising Star

Thanks Jk for your immediate response.

After using "ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES(“serialization.encoding”=’UTF-8′);" solved the spanish character issue. But we have one more column with values like -10,476.53 because of this column, we had column jumping , this values stored in hive -10 in one column and 476.53 in another column. Do the needful.

avatar

Hi @Sundar Lakshmanan. The input file must use a comma as the field delimiter.

If you can change the format of the input file to use a more non-standard character such as a pipe or a tilde, that would fix the issue. With the lazy simple serde you can specifiy a different field delimiter as another serdeproperty: 'field.delim' = '|'

Otherwise you could use a pig script to pre-process the data to remove comma from that field, or use some hive post-processing to collapse the 2 fields down into one in a new table.

avatar
Rising Star

Thanks bpreachuk,

'field.delim' = '|' also not helping me, But somehow we have fixed the issue with below CSV serde properties :

WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"", "escapeChar" = "\\", "serialization.encoding"='ISO-8859-1') LOCATION '/path/' TBLPROPERTIES ( 'store.charset'='ISO-8859-1', 'retrieve.charset'='ISO-8859-1', 'skip.header.line.count'='1');

avatar
New Contributor

Change the format of the input file to use a more non-standard character

Steve,

Spanish to English

avatar
New Contributor

Well, this is my first visit to your blog! Your blog provided us valuable information .You have done a marvelous job

- happy wheels

,

I think this is an informative post and it is very useful and knowledgeable. I really enjoyed reading this post. big fan, thank you!

- happy wheels