- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How can I remove accents from a string in Impala?
- Labels:
-
Apache Impala
Created on 08-29-2018 01:00 PM - edited 09-16-2022 06:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For example:
Assume I have the following string: "Gánémílózú"
I need the query to return "Ganemilozu" (without the accents)
I've tried to use the translate function, as follows:
SELECT translate('Gánémílózú', 'áéíóú', 'aeiou') FROM (SELECT 1) as dual;
But it returns "Gaenaomalaza". Apparently I didn't understand the translate function.
Can anyone explain why doesn't my approach work or lead me to a possible solution to remove the accents?
Thanks in advance!
EDIT:
I found that the problem lies not in the translate function but in the handling of accents by Impala.
The translation:
SELECT translate('Gánémílózú', 'Gnmlz', '12345') FROM (SELECT 1) as dual;
Works perfectly fine by returning 1á2é3í4ó5ú
Does anyone know how does Impala handle characters with accents?
Created 08-30-2018 08:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can do a replace a character at a time with something like:
SELECT regexp_replace(regexp_replace('Gánémílózú', 'á', 'a'), 'é', 'e');
but that is pretty ugly (I could only bear to do 2 characters).
-Andrew
Created 08-30-2018 09:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for clarifying that translate only works on ASCII characters.
I found a similar solution to the one you proposed, using the replace function instead of the regexp replace. However, as you mention, the solution is really not scalable and it is very tedious to implement and maintain.
Created 08-30-2018 09:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
