Created 09-28-2016 07:17 PM
Does by default hive support UTF-8 encoding? If it does not, How do i make the entire hive database to support UTF-8 encoding? I am getting an issue while transfering Sql-server tables to hive. I am seeing corrupted strings. I know that i can alter hive table with setting serde.encoding to UTF-8 but is there a way to set the entire hive database to UTF-8. Any help would be appreciated
Thanks
Created 09-29-2016 03:12 AM
@Michael Young Any thoughts?
Created 09-29-2016 04:07 AM
Did you try the following format while creating hive table?
ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES(“serialization.encoding”=’UTF-8′);
Also please see: https://community.hortonworks.com/questions/54162/why-hive-is-not-able-to-store-special-characters-l...
Created 09-29-2016 03:41 PM
Hive does support UTF-8 encoding of data. As @jk has shown, you can create the table using the LazySimpleSerDe. You can read more about it Hive's UTF support here:
You can use Unicode string on data/comments, but cannot use for database/table/column name. You can use UTF-8 encoding for Hive data. However, other encodings are not supported (HIVE-7142 introduce encoding for LazySimpleSerDe, however, the implementation is not complete and not address all cases).
Created 09-30-2016 03:43 AM
Hive default encoding is UTF8, and therefore setting serialization.encoding to UTF8 on a file in UTF8 is unnecessary. However, if you are facing troubles, there is a high probability that your input file is using another character set. In that case set 'serialization.encoding' to the encoding of the input file. A quick search show that the default charset of Sql server is ISO-8859-1 (alias latin1), so you can try 'serialization.encoding'='ISO-8859-1'. For examples see my recent article on Hive charsets.
Created 05-10-2018 03:29 AM
Not working fine for