- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
the multiple languages supported including which languages are supported?
Created ‎06-30-2016 08:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am looking for answer for one of the RFP questions:
The customers core systems are Unicode based and support multiple languages even though English is their corporate language. Can you let me know what are the multiple languages supported including which languages are supported.
Not really sure with the answer. Can you please help.
Thanks,
Sujitha
Created ‎07-01-2016 08:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hive, Pig (by means of PigStorage), and Spark all support UTF-8. However, it's not easy to say which languages are completely supported by UTF-8, because, for example some rarely used CJK characters (like in historical texts) outside of the so-called Basic Multilingual Plane (BMP) are not well supported in practice. Therefore, it's better to list up the languages you plan to use, and ask are they supported. In summary, if a language alphabet is completely included in BMP then it's completely supported.
Edit: For a cool reading (over the weekend?) see this: Would UTF-8 be able to support the inclusion of a vast alien language with millions of new character...
Created ‎06-30-2016 08:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the question is data in hive, then by default hive data is UTF8, so languages that are supported with UTF8 will work out of the box. Same with HDFS.
Created ‎07-01-2016 12:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Ravi Mutyala,
Thanks for the response. With this I understand that the whole hadoop ecosystem uses UTF-8. Is that correct? Can you confirm.
Thanks,
Sujitha
Created ‎07-01-2016 08:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hive, Pig (by means of PigStorage), and Spark all support UTF-8. However, it's not easy to say which languages are completely supported by UTF-8, because, for example some rarely used CJK characters (like in historical texts) outside of the so-called Basic Multilingual Plane (BMP) are not well supported in practice. Therefore, it's better to list up the languages you plan to use, and ask are they supported. In summary, if a language alphabet is completely included in BMP then it's completely supported.
Edit: For a cool reading (over the weekend?) see this: Would UTF-8 be able to support the inclusion of a vast alien language with millions of new character...
