Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala and unicode

Impala and unicode

Explorer

I would like to use Impala in an organisation with data kept in Hebrew.
I read that Impala has some limitations when dealing with Unicode characters.
Is the limitation related only to string comparison and string functions or also for storing and selecting?
Is there a way around it?
Thanks!

4 REPLIES 4

Re: Impala and unicode

Contributor

Impala treats all string data as byte arrays and does nothing speical if the data is unicode. Impala

can select, store, compare for equality, etc so depending on your use case, this might be sufficient.

Highlighted

Re: Impala and unicode

Explorer
As long as I can compare and use string functions (even using only UTF-8) it is certainly enough.
Thanks!

Re: Impala and unicode

New Contributor

The string functions don't work on Unicode data. Only comparing them byte-for-byte. Take an example of the following function:

 

substr("áele", 1, 1) will return � because it only returns the first byte of the 2-byte character "á"

 

This is true for other functions like length where doing length("áele") will return 5.

 

This isn't to bash Impala but to make sure no one is mislead from this thread that Unicode string functions will work for string manipulation.

 

 

Re: Impala and unicode

Master Collaborator