Support Questions

Find answers, ask questions, and share your expertise

the multiple languages supported including which languages are supported?

Expert Contributor

Hi,

I am looking for answer for one of the RFP questions:

The customers core systems are Unicode based and support multiple languages even though English is their corporate language. Can you let me know what are the multiple languages supported including which languages are supported.

Not really sure with the answer. Can you please help.

Thanks,

Sujitha

1 ACCEPTED SOLUTION

Hive, Pig (by means of PigStorage), and Spark all support UTF-8. However, it's not easy to say which languages are completely supported by UTF-8, because, for example some rarely used CJK characters (like in historical texts) outside of the so-called Basic Multilingual Plane (BMP) are not well supported in practice. Therefore, it's better to list up the languages you plan to use, and ask are they supported. In summary, if a language alphabet is completely included in BMP then it's completely supported.

Edit: For a cool reading (over the weekend?) see this: Would UTF-8 be able to support the inclusion of a vast alien language with millions of new character...

View solution in original post

3 REPLIES 3

Guru

If the question is data in hive, then by default hive data is UTF8, so languages that are supported with UTF8 will work out of the box. Same with HDFS.

Expert Contributor

Hi @Ravi Mutyala,

Thanks for the response. With this I understand that the whole hadoop ecosystem uses UTF-8. Is that correct? Can you confirm.

Thanks,

Sujitha

Hive, Pig (by means of PigStorage), and Spark all support UTF-8. However, it's not easy to say which languages are completely supported by UTF-8, because, for example some rarely used CJK characters (like in historical texts) outside of the so-called Basic Multilingual Plane (BMP) are not well supported in practice. Therefore, it's better to list up the languages you plan to use, and ask are they supported. In summary, if a language alphabet is completely included in BMP then it's completely supported.

Edit: For a cool reading (over the weekend?) see this: Would UTF-8 be able to support the inclusion of a vast alien language with millions of new character...

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.