Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How can i handle the spanish characters in hive table? Ex : some columns in CSV file have the values like ó, ñ. While load this values to hive,It stored like box �. Please help me on this, i want to create hive table and load this values properly.

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar
Rising Star

Thanks bpreachuk,

'field.delim' = '|' also not helping me, But somehow we have fixed the issue with below CSV serde properties :

WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"", "escapeChar" = "\\", "serialization.encoding"='ISO-8859-1') LOCATION '/path/' TBLPROPERTIES ( 'store.charset'='ISO-8859-1', 'retrieve.charset'='ISO-8859-1', 'skip.header.line.count'='1');

View solution in original post

6 REPLIES 6

avatar
Not applicable

avatar
Rising Star

Thanks Jk for your immediate response.

After using "ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES(“serialization.encoding”=’UTF-8′);" solved the spanish character issue. But we have one more column with values like -10,476.53 because of this column, we had column jumping , this values stored in hive -10 in one column and 476.53 in another column. Do the needful.

avatar

Hi @Sundar Lakshmanan. The input file must use a comma as the field delimiter.

If you can change the format of the input file to use a more non-standard character such as a pipe or a tilde, that would fix the issue. With the lazy simple serde you can specifiy a different field delimiter as another serdeproperty: 'field.delim' = '|'

Otherwise you could use a pig script to pre-process the data to remove comma from that field, or use some hive post-processing to collapse the 2 fields down into one in a new table.

avatar
Rising Star

Thanks bpreachuk,

'field.delim' = '|' also not helping me, But somehow we have fixed the issue with below CSV serde properties :

WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"", "escapeChar" = "\\", "serialization.encoding"='ISO-8859-1') LOCATION '/path/' TBLPROPERTIES ( 'store.charset'='ISO-8859-1', 'retrieve.charset'='ISO-8859-1', 'skip.header.line.count'='1');

avatar

Change the format of the input file to use a more non-standard character

Steve,

Spanish to English

avatar
New Member

Well, this is my first visit to your blog! Your blog provided us valuable information .You have done a marvelous job

- happy wheels

,

I think this is an informative post and it is very useful and knowledgeable. I really enjoyed reading this post. big fan, thank you!

- happy wheels