Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

UTF-8 hive

avatar
Contributor

Does by default hive support UTF-8 encoding? If it does not, How do i make the entire hive database to support UTF-8 encoding? I am getting an issue while transfering Sql-server tables to hive. I am seeing corrupted strings. I know that i can alter hive table with setting serde.encoding to UTF-8 but is there a way to set the entire hive database to UTF-8. Any help would be appreciated

Thanks

5 REPLIES 5

avatar
Contributor

@Michael Young Any thoughts?

avatar

@Praneender Vuppala

Did you try the following format while creating hive table?

ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES(“serialization.encoding”=’UTF-8′);

Also please see: https://community.hortonworks.com/questions/54162/why-hive-is-not-able-to-store-special-characters-l...

avatar
Super Guru

@Praneender Vuppala

Hive does support UTF-8 encoding of data. As @jk has shown, you can create the table using the LazySimpleSerDe. You can read more about it Hive's UTF support here:

Hive User FAQ

You can use Unicode string on data/comments, but cannot use for database/table/column name.

You can use UTF-8 encoding for Hive data. However, other encodings are not 
supported (HIVE-7142 introduce encoding for LazySimpleSerDe, however, 
the implementation is not complete and not address all cases).

avatar
Master Guru

Hive default encoding is UTF8, and therefore setting serialization.encoding to UTF8 on a file in UTF8 is unnecessary. However, if you are facing troubles, there is a high probability that your input file is using another character set. In that case set 'serialization.encoding' to the encoding of the input file. A quick search show that the default charset of Sql server is ISO-8859-1 (alias latin1), so you can try 'serialization.encoding'='ISO-8859-1'. For examples see my recent article on Hive charsets.

avatar
New Contributor

Not working fine for