07-05-2018 06:02 AM - edited 07-05-2018 06:05 AM
I've found that the regexp_extract and regexp_replace functions behave differently, depending on comparisons done in the same query or the database used.
Consider the following script:
create schema test; create table test.a (text string); insert into test.a values ("a");
The following query behaves inconsistently with the Impala documentation:
select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text from test.a;
According to the Impala documentation, double backslashes should be used as a regex escape character. However, it doesn't work here (see Col2 in the above result). Instead, it does work when using a single backslash.
If we add an unrelated comparison to the query, this behaviour changes:
select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text, text = "a" from test.a;
Now, a double backslash is required for the regex to function correctly. The result is identical, if one uses another table column in the comparison or puts the comparison into the where-clause.
This strange behavior is only present when running queries over JDBC/ODBC on a non-default database. Hue and Impala-Shell work as expected. And JDBC/ODBC-queries work as expected when executed on tables in the default database.
I've tested this on CDH5.15.0 and 5.13.1 with JDBC-22.214.171.1244 and ODBC v126.96.36.1994 (32bit) drivers.
Is this a bug or am I missing something? Anyone else experiencing the same issue?