Reply
Explorer
Posts: 7
Registered: ‎06-22-2014

Impala and unicode

I would like to use Impala in an organisation with data kept in Hebrew.
I read that Impala has some limitations when dealing with Unicode characters.
Is the limitation related only to string comparison and string functions or also for storing and selecting?
Is there a way around it?
Thanks!

Cloudera Employee
Posts: 27
Registered: ‎09-27-2013

Re: Impala and unicode

Impala treats all string data as byte arrays and does nothing speical if the data is unicode. Impala

can select, store, compare for equality, etc so depending on your use case, this might be sufficient.

Highlighted
Explorer
Posts: 7
Registered: ‎06-22-2014

Re: Impala and unicode

As long as I can compare and use string functions (even using only UTF-8) it is certainly enough.
Thanks!
New Contributor
Posts: 3
Registered: ‎12-09-2014

Re: Impala and unicode

The string functions don't work on Unicode data. Only comparing them byte-for-byte. Take an example of the following function:

 

substr("áele", 1, 1) will return � because it only returns the first byte of the 2-byte character "á"

 

This is true for other functions like length where doing length("áele") will return 5.

 

This isn't to bash Impala but to make sure no one is mislead from this thread that Unicode string functions will work for string manipulation.

 

 

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: Impala and unicode