Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

straight SELECT and SELECT via CTE produce different results

avatar
Explorer

I ran into an issue last week, which seems to be an issue related to the Cloudera ODBC Driver for Impala.

 

First I thought it was an issue with pyodbc, but after submitting the issue on their github we figured out that it was a driver issue. For reference: pyodbc github issue 1197

 

Simply put, executing query1 and query2 should produce the same results, but it doesn't:

 

query1 = """SELECT
    col1, col2 as col2_orig, 
    (CASE WHEN col2 IS NOT NULL THEN col2
        ELSE 0 END) AS col2_1,
    (CASE WHEN col2 IS NULL THEN 0
        ELSE col2 END) AS col2_2
    FROM  sandbox.jk_test"""

query2 = f"""WITH wrapped_table AS ({query1}) SELECT * FROM wrapped_table"""

 

It results in two different sets:

 

['col1', 'col2_orig', 'col2_1', 'col2_2']
['1   ', '1        ', '1     ', '1     ']
['2   ', 'None     ', 'None  ', '0     ']

['col1', 'col2_orig', 'col2_1', 'col2_2']
['1   ', '1        ', '1     ', '1     ']
['2   ', 'None     ', '0     ', '0     ']

 

The 2nd result is the expected one.

* See the github issue for the code used to run these queries.

 

 

Running the queries with CData's driver solves the issue, as does connecting using a JDBC connection.

My Environment:

  • Python: Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]
  • OS: Windows 10 Enterprise 20H2 64-bit
  • driver: Cloudera ODBC Driver for Impala 2.06.16.1022
  • Windows Script Host Version 5.812 (for vbs script)
1 ACCEPTED SOLUTION

avatar

Hi @JKarount ,

To close the loop, this has been resolved in the latest Cloudera Impala ODBC driver 2.7.0, see the Resolved Issues section in the Release Notes:
https://docs.cloudera.com/documentation/other/connectors/impala-odbc/2-7-0/Release-Notes-Impala-ODBC... 

"[IMP-946][02795738] The connector does not generate the last COALESCE
parameter from the ELSE expression in the CASE statement."

You can download it from:

https://www.cloudera.com/downloads/connectors/impala/odbc/2-7-0.html 

I hope this will help you to implement your usecases as expected.

Best regards

 Miklos Szurap

Customer Operations Engineer, Cloudera

View solution in original post

10 REPLIES 10

avatar
Community Manager

@JKarount, Welcome to our community! To help you get the best possible answer, I have tagged in our Impala experts @Chella @mszurap @ChethanYM who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar

Hi @JKarount , thank you for the detailed description and reproduction steps. To be able to help you on this, kindly open a formal support case through our Cloudera Support portal.

Please also reproduce this with the latest ODBC driver version (2.6.17 as of now) with driver TRACE logs enabled. Please see the ODBC driver install guide

https://downloads.cloudera.com/connectors/ClouderaImpala_ODBC_2.6.17.1026/Cloudera-ODBC-Connector-fo...

(in Windows go to the ODBC datasource configuration, and enable it under Logging Options)

 

The support team will be able to triage this further with our driver team.

 

Thank you

 Miklos Szurap

Customer Operations Engineer, Cloudera

avatar
Explorer

@mszurap Thanks for your reply, I forgot to update the original post, but I have already tried updating to 2.6.17.

I will open a support case, and paste the trace as requested.

avatar
Explorer

Sorry to post two messages so close to one-another, but I am not allowed to open a case.

@mszurap, can you tell me what you would advise?

 

I have checked the log myself, and have found that the query is converted in the following way:

(CASE WHEN col2 IS NOT NULL THEN col2 ELSE 0 END) AS col2_1

 becomes:

COALESCE(`jk_test`.`col2`) AS `col2_1`

but it should have become:

 

COALESCE(`jk_test`.`col2`, 0) AS `col2_1`

 

 

 

I don't see any way to attach a file to this thread, but I will paste the important snippets here:

line 2636:
Apr 19 15:28:46.485 INFO  15044 Simba::ODBC::StatementState::InternalPrepare: Preparing query: SELECT col1, col2 as col2_orig, (CASE WHEN col2 IS NOT NULL THEN col2     ELSE 0 END) AS col2_1, (CASE WHEN col2 IS NULL THEN 0     ELSE col2 END) AS col2_2 FROM  sandbox.jk_test

line 2767:
Apr 19 15:28:47.016 TRACE 15044 Simba::ImpalaODBC::ImpalaSQLGenerator::GenerateAEValueExpr: Generated SQL: COALESCE(`jk_test`.`col2`) AS `col2_1` 

line 2849:
Apr 19 15:28:47.016 TRACE 15044 Simba::ImpalaODBC::ImpalaSQLGenerator::GenerateOneStatement: Generated SQL: SELECT `jk_test`.`col1`, `jk_test`.`col2` AS `col2_orig` , COALESCE(`jk_test`.`col2`) AS `col2_1` , (CASE WHEN (`jk_test`.`col2` IS NULL) THEN 0 ELSE `jk_test`.`col2` END) AS `col2_2`  FROM `sandbox`.`jk_test` `jk_test` 


line 2858-2861:
Apr 19 15:28:47.016 DEBUG 15044 Simba::ImpalaODBC::ImpalaTCLIServiceThreadSafeClient::ExecuteStatement: TExecuteStatementReq
    sessionHandle.sessionId.guid = 
    statement = SELECT `jk_test`.`col1`, `jk_test`.`col2` AS `col2_orig` , COALESCE(`jk_test`.`col2`) AS `col2_1` , (CASE WHEN (`jk_test`.`col2` IS NULL) THEN 0 ELSE `jk_test`.`col2` END) AS `col2_2`  FROM `sandbox`.`jk_test` `jk_test` 
    runAsync = true

Of course, I'm guessing other lines might be important for debugging, I am confident that there is an issue in the driver when generating the query, probably to optimize it... Because when using the CTE version, no query is generated whatsoever, and the raw one is executed:

lines 3587-3590:
Apr 19 15:28:48.365 DEBUG 15044 Simba::ImpalaODBC::ImpalaTCLIServiceThreadSafeClient::ExecuteStatement: TExecuteStatementReq
    sessionHandle.sessionId.guid = 
    statement = WITH wrapped_table AS (SELECT col1, col2 as col2_orig, (CASE WHEN col2 IS NOT NULL THEN col2     ELSE 0 END) AS col2_1, (CASE WHEN col2 IS NULL THEN 0     ELSE col2 END) AS col2_2 FROM  gda_sandbox.jk_test) SELECT * FROM wrapped_table
    runAsync = true

 

Please let me know if I can do anything else, or if there is any other way for a non-paying user to submit bug-reports.

 

Thanks!

avatar

Hi @JKarount ,

I see, we will take up the bug report internally as it may affect other customers too, in any case thanks for your diagnostics and troubleshooting.

In my opinion to achieve the same functionality and most likely it will pass through the ODBC driver as-is, please try to use the Impala's NVL or NVL2 function (or even ZEROIFNULL if it's a numeric column):
https://impala.apache.org/docs/build/html/topics/impala_conditional_functions.html#conditional_funct...

SELECT col1, col2 as col2_orig, NVL2(col2, col2, 0) as col2_1 FROM sandbox.jk_test

or

SELECT col1, col2 as col2_orig, NVL(col2, 0) as col2_1 FROM sandbox.jk_test

 Hope that helps.

 

Best regards

 Miklos

avatar
Explorer

I now see I didn't see this reply as I was busy troubleshooting and replying about the advanced option check mark.. Sorry 😅

 

I was not aware that NVL functions were available in Impala, thanks for the proposal. In my case I could also reverse the CASE WHEN condition (to check for IS NULL instead of IS NOT NULL, which also generated the correct result.

 

Nevertheless, although for me the impact is quite low, I'm not sure this is the case for all possible usages, and that's why I wanted to make the issue known. Thanks for picking it up, and hopefully it get's solved in a coming update!

 

Thanks for your assistance,

Jeremy

avatar

Hi Jeremy,

Sure, you're welcome. We also thank you for the report, I've forwarded it to our driver team.

Best regards

 Miklos

avatar
Explorer

And another update from my side:

 

In the advanced settings I can select "Use Native Query".

With this option "on", the results are as expected.

 

For now that will be my workaround, but I expect I am "loosing" on some query optimizations..

 

I'll gladly help if more info is needed!

 

avatar

Hi @JKarount ,

To close the loop, this has been resolved in the latest Cloudera Impala ODBC driver 2.7.0, see the Resolved Issues section in the Release Notes:
https://docs.cloudera.com/documentation/other/connectors/impala-odbc/2-7-0/Release-Notes-Impala-ODBC... 

"[IMP-946][02795738] The connector does not generate the last COALESCE
parameter from the ELSE expression in the CASE statement."

You can download it from:

https://www.cloudera.com/downloads/connectors/impala/odbc/2-7-0.html 

I hope this will help you to implement your usecases as expected.

Best regards

 Miklos Szurap

Customer Operations Engineer, Cloudera