Created 03-24-2022 01:20 AM
From the docs at https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_string_functions.html#string_functions__regexp_extract
"This example shows how a pattern string starting with .*? matches the shortest possible portion of the source string, returning the rightmost set of lowercase letters."
select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',1);
returns def
Every other Regex impl I've worked with would return bcd
I can't make sense of the docs either - "shortest possible string.. returns the rightmost.." - the shortest possible string in a "search from the left" operation returns the leftmost, not the rightmost
Created 03-24-2022 02:06 AM
@cjard ,
I believe this is a bug. Check this out: https://issues.apache.org/jira/browse/IMPALA-2917
I agree that the documentation needs to be fixed as well.
André
Created 03-24-2022 10:01 AM
I've advised the documentation team of the confusing wording for you. Thank you for letting us know.
Created on 03-24-2022 11:32 AM - edited 03-24-2022 11:33 AM
Thanks all!
I'm curious - does it mean that the documentation will be heavily rewritten to detail how .*? implementation is atypical, or will the bug be fixed so the documentation can be tweaked to be correct?
Created 03-24-2022 11:34 AM
That will be up to the documentation team. I alerted them to your concerns so they can look into it.
Created 03-24-2022 11:35 AM
Fingers crossed it can push the reported bug up the review list a bit then! 🙂 Thanks for the excellent help and attention..