Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎08-22-2017

JDBC/ODBC: regexp_* functions' behavior depends on unrelated comparisons and database used

[ Edited ]

I've found that the regexp_extract and regexp_replace functions behave differently, depending on comparisons done in the same query or the database used.

 

Consider the following script:

create schema test;
create table test.a (text string);
insert into test.a values ("a");

 

The following query behaves inconsistently with the Impala documentation:

select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text from test.a;

image.png

 

According to the Impala documentation, double backslashes should be used as a regex escape character. However, it doesn't work here (see Col2 in the above result). Instead, it does work when using a single backslash.

 

If we add an unrelated comparison to the query, this behaviour changes:

select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text, text = "a" from test.a;

image.png

 

Now, a double backslash is required for the regex to function correctly. The result is identical, if one uses another table column in the comparison or puts the comparison into the where-clause.

 

This strange behavior is only present when running queries over JDBC/ODBC on a non-default database. Hue and Impala-Shell work as expected. And JDBC/ODBC-queries work as expected when executed on tables in the default database.

 

I've tested this on CDH5.15.0 and 5.13.1 with JDBC-2.6.3.1004 and ODBC v2.5.37.1014 (32bit) drivers.

 

Is this a bug or am I missing something? Anyone else experiencing the same issue?

Announcements