- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How can i handle the spanish characters in hive table? Ex : some columns in CSV file have the values like ó, ñ. While load this values to hive,It stored like box �. Please help me on this, i want to create hive table and load this values properly.
- Labels:
-
Apache Hive
Created 09-07-2016 11:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 09-07-2016 03:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks bpreachuk,
'field.delim' = '|' also not helping me, But somehow we have fixed the issue with below CSV serde properties :
WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"", "escapeChar" = "\\", "serialization.encoding"='ISO-8859-1') LOCATION '/path/' TBLPROPERTIES ( 'store.charset'='ISO-8859-1', 'retrieve.charset'='ISO-8859-1', 'skip.header.line.count'='1');
Created 09-07-2016 11:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does it help: (for me it did not help, may be i missed something)
Created 09-07-2016 01:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jk for your immediate response.
After using "ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES(“serialization.encoding”=’UTF-8′);" solved the spanish character issue. But we have one more column with values like -10,476.53 because of this column, we had column jumping , this values stored in hive -10 in one column and 476.53 in another column. Do the needful.
Created 09-07-2016 02:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Sundar Lakshmanan. The input file must use a comma as the field delimiter.
If you can change the format of the input file to use a more non-standard character such as a pipe or a tilde, that would fix the issue. With the lazy simple serde you can specifiy a different field delimiter as another serdeproperty: 'field.delim' = '|'
Otherwise you could use a pig script to pre-process the data to remove comma from that field, or use some hive post-processing to collapse the 2 fields down into one in a new table.
Created 09-07-2016 03:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks bpreachuk,
'field.delim' = '|' also not helping me, But somehow we have fixed the issue with below CSV serde properties :
WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"", "escapeChar" = "\\", "serialization.encoding"='ISO-8859-1') LOCATION '/path/' TBLPROPERTIES ( 'store.charset'='ISO-8859-1', 'retrieve.charset'='ISO-8859-1', 'skip.header.line.count'='1');
Created 08-30-2017 07:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 07-05-2018 04:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, this is my first visit to your blog! Your blog provided us valuable information .You have done a marvelous job
,I think this is an informative post and it is very useful and knowledgeable. I really enjoyed reading this post. big fan, thank you!
