Created 04-17-2023 01:33 AM
I ran into an issue last week, which seems to be an issue related to the Cloudera ODBC Driver for Impala.
First I thought it was an issue with pyodbc, but after submitting the issue on their github we figured out that it was a driver issue. For reference: pyodbc github issue 1197
Simply put, executing query1 and query2 should produce the same results, but it doesn't:
query1 = """SELECT
col1, col2 as col2_orig,
(CASE WHEN col2 IS NOT NULL THEN col2
ELSE 0 END) AS col2_1,
(CASE WHEN col2 IS NULL THEN 0
ELSE col2 END) AS col2_2
FROM sandbox.jk_test"""
query2 = f"""WITH wrapped_table AS ({query1}) SELECT * FROM wrapped_table"""
It results in two different sets:
['col1', 'col2_orig', 'col2_1', 'col2_2']
['1 ', '1 ', '1 ', '1 ']
['2 ', 'None ', 'None ', '0 ']
['col1', 'col2_orig', 'col2_1', 'col2_2']
['1 ', '1 ', '1 ', '1 ']
['2 ', 'None ', '0 ', '0 ']
The 2nd result is the expected one.
* See the github issue for the code used to run these queries.
Running the queries with CData's driver solves the issue, as does connecting using a JDBC connection.
My Environment:
Created 10-11-2023 08:42 AM
Hi @JKarount ,
To close the loop, this has been resolved in the latest Cloudera Impala ODBC driver 2.7.0, see the Resolved Issues section in the Release Notes:
https://docs.cloudera.com/documentation/other/connectors/impala-odbc/2-7-0/Release-Notes-Impala-ODBC...
"[IMP-946][02795738] The connector does not generate the last COALESCE
parameter from the ELSE expression in the CASE statement."
You can download it from:
https://www.cloudera.com/downloads/connectors/impala/odbc/2-7-0.html
I hope this will help you to implement your usecases as expected.
Best regards
Miklos Szurap
Customer Operations Engineer, Cloudera
Created 04-18-2023 01:06 AM
@JKarount, Welcome to our community! To help you get the best possible answer, I have tagged in our Impala experts @Chella @mszurap @ChethanYM who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 04-19-2023 01:06 AM
Hi @JKarount , thank you for the detailed description and reproduction steps. To be able to help you on this, kindly open a formal support case through our Cloudera Support portal.
Please also reproduce this with the latest ODBC driver version (2.6.17 as of now) with driver TRACE logs enabled. Please see the ODBC driver install guide
(in Windows go to the ODBC datasource configuration, and enable it under Logging Options)
The support team will be able to triage this further with our driver team.
Thank you
Miklos Szurap
Customer Operations Engineer, Cloudera
Created 04-19-2023 06:24 AM
@mszurap Thanks for your reply, I forgot to update the original post, but I have already tried updating to 2.6.17.
I will open a support case, and paste the trace as requested.
Created 04-19-2023 06:54 AM
Sorry to post two messages so close to one-another, but I am not allowed to open a case.
@mszurap, can you tell me what you would advise?
I have checked the log myself, and have found that the query is converted in the following way:
(CASE WHEN col2 IS NOT NULL THEN col2 ELSE 0 END) AS col2_1
becomes:
COALESCE(`jk_test`.`col2`) AS `col2_1`
but it should have become:
COALESCE(`jk_test`.`col2`, 0) AS `col2_1`
I don't see any way to attach a file to this thread, but I will paste the important snippets here:
line 2636:
Apr 19 15:28:46.485 INFO 15044 Simba::ODBC::StatementState::InternalPrepare: Preparing query: SELECT col1, col2 as col2_orig, (CASE WHEN col2 IS NOT NULL THEN col2 ELSE 0 END) AS col2_1, (CASE WHEN col2 IS NULL THEN 0 ELSE col2 END) AS col2_2 FROM sandbox.jk_test
line 2767:
Apr 19 15:28:47.016 TRACE 15044 Simba::ImpalaODBC::ImpalaSQLGenerator::GenerateAEValueExpr: Generated SQL: COALESCE(`jk_test`.`col2`) AS `col2_1`
line 2849:
Apr 19 15:28:47.016 TRACE 15044 Simba::ImpalaODBC::ImpalaSQLGenerator::GenerateOneStatement: Generated SQL: SELECT `jk_test`.`col1`, `jk_test`.`col2` AS `col2_orig` , COALESCE(`jk_test`.`col2`) AS `col2_1` , (CASE WHEN (`jk_test`.`col2` IS NULL) THEN 0 ELSE `jk_test`.`col2` END) AS `col2_2` FROM `sandbox`.`jk_test` `jk_test`
line 2858-2861:
Apr 19 15:28:47.016 DEBUG 15044 Simba::ImpalaODBC::ImpalaTCLIServiceThreadSafeClient::ExecuteStatement: TExecuteStatementReq
sessionHandle.sessionId.guid =
statement = SELECT `jk_test`.`col1`, `jk_test`.`col2` AS `col2_orig` , COALESCE(`jk_test`.`col2`) AS `col2_1` , (CASE WHEN (`jk_test`.`col2` IS NULL) THEN 0 ELSE `jk_test`.`col2` END) AS `col2_2` FROM `sandbox`.`jk_test` `jk_test`
runAsync = true
Of course, I'm guessing other lines might be important for debugging, I am confident that there is an issue in the driver when generating the query, probably to optimize it... Because when using the CTE version, no query is generated whatsoever, and the raw one is executed:
lines 3587-3590:
Apr 19 15:28:48.365 DEBUG 15044 Simba::ImpalaODBC::ImpalaTCLIServiceThreadSafeClient::ExecuteStatement: TExecuteStatementReq
sessionHandle.sessionId.guid =
statement = WITH wrapped_table AS (SELECT col1, col2 as col2_orig, (CASE WHEN col2 IS NOT NULL THEN col2 ELSE 0 END) AS col2_1, (CASE WHEN col2 IS NULL THEN 0 ELSE col2 END) AS col2_2 FROM gda_sandbox.jk_test) SELECT * FROM wrapped_table
runAsync = true
Please let me know if I can do anything else, or if there is any other way for a non-paying user to submit bug-reports.
Thanks!
Created 04-19-2023 08:03 AM
Hi @JKarount ,
I see, we will take up the bug report internally as it may affect other customers too, in any case thanks for your diagnostics and troubleshooting.
In my opinion to achieve the same functionality and most likely it will pass through the ODBC driver as-is, please try to use the Impala's NVL or NVL2 function (or even ZEROIFNULL if it's a numeric column):
https://impala.apache.org/docs/build/html/topics/impala_conditional_functions.html#conditional_funct...
SELECT col1, col2 as col2_orig, NVL2(col2, col2, 0) as col2_1 FROM sandbox.jk_test
or
SELECT col1, col2 as col2_orig, NVL(col2, 0) as col2_1 FROM sandbox.jk_test
Hope that helps.
Best regards
Miklos
Created 04-20-2023 07:16 AM
I now see I didn't see this reply as I was busy troubleshooting and replying about the advanced option check mark.. Sorry 😅
I was not aware that NVL functions were available in Impala, thanks for the proposal. In my case I could also reverse the CASE WHEN condition (to check for IS NULL instead of IS NOT NULL, which also generated the correct result.
Nevertheless, although for me the impact is quite low, I'm not sure this is the case for all possible usages, and that's why I wanted to make the issue known. Thanks for picking it up, and hopefully it get's solved in a coming update!
Thanks for your assistance,
Jeremy
Created 04-20-2023 07:44 AM
Hi Jeremy,
Sure, you're welcome. We also thank you for the report, I've forwarded it to our driver team.
Best regards
Miklos
Created 04-19-2023 08:14 AM
And another update from my side:
In the advanced settings I can select "Use Native Query".
With this option "on", the results are as expected.
For now that will be my workaround, but I expect I am "loosing" on some query optimizations..
I'll gladly help if more info is needed!
Created 10-11-2023 08:42 AM
Hi @JKarount ,
To close the loop, this has been resolved in the latest Cloudera Impala ODBC driver 2.7.0, see the Resolved Issues section in the Release Notes:
https://docs.cloudera.com/documentation/other/connectors/impala-odbc/2-7-0/Release-Notes-Impala-ODBC...
"[IMP-946][02795738] The connector does not generate the last COALESCE
parameter from the ELSE expression in the CASE statement."
You can download it from:
https://www.cloudera.com/downloads/connectors/impala/odbc/2-7-0.html
I hope this will help you to implement your usecases as expected.
Best regards
Miklos Szurap
Customer Operations Engineer, Cloudera