New Contributor
Posts: 1
Registered: ‎08-22-2017

JDBC/ODBC: regexp_* functions' behavior depends on unrelated comparisons and database used

[ Edited ]

I've found that the regexp_extract and regexp_replace functions behave differently, depending on comparisons done in the same query or the database used.


Consider the following script:

create schema test;
create table test.a (text string);
insert into test.a values ("a");


The following query behaves inconsistently with the Impala documentation:

select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text from test.a;



According to the Impala documentation, double backslashes should be used as a regex escape character. However, it doesn't work here (see Col2 in the above result). Instead, it does work when using a single backslash.


If we add an unrelated comparison to the query, this behaviour changes:

select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text, text = "a" from test.a;



Now, a double backslash is required for the regex to function correctly. The result is identical, if one uses another table column in the comparison or puts the comparison into the where-clause.


This strange behavior is only present when running queries over JDBC/ODBC on a non-default database. Hue and Impala-Shell work as expected. And JDBC/ODBC-queries work as expected when executed on tables in the default database.


I've tested this on CDH5.15.0 and 5.13.1 with JDBC- and ODBC v2.5.37.1014 (32bit) drivers.


Is this a bug or am I missing something? Anyone else experiencing the same issue?