Support Questions
Find answers, ask questions, and share your expertise

French Bilingual support for Hive

Contributor

Customer is facing issues with French character set, when data is populated to Hive.

Records are getting split when French characters are encountered.

Checking on internet blogs, the recommendation I can find is to implement custom Serde's .

Are there any options to handle french characters in Hive after loading data ?

Or is it recommended to pre-process French characters prior to loading ?

3 REPLIES 3

Re: French Bilingual support for Hive

New Contributor

Custom SerDes are always a last resort. What is the encoding of data itself? Hive expects UTF-8 data. If the encoding is, say, ISO/IEC 8859-1, you will need to either convert the data or you can try the feature added in https://issues.apache.org/jira/browse/HIVE-7142

Re: French Bilingual support for Hive

Thank you Carter.

Also another thing to check is your Locale, since it has been known to cause problems:

https://issues.apache.org/jira/browse/HIVE-2859

https://issues.apache.org/jira/browse/HIVE-3245

In Linux, for instance, do a:

echo $LANG

and set it to UTF-8 if not already so:

$ export LANG=UTF-8

Let us know if this helps.

Re: French Bilingual support for Hive

Mentor

@pbalasundaram are you still having issues with this? Can you accept best answer or provide your own solution?