I ran into an issue last week, which seems to be an issue related to the Cloudera ODBC Driver for Impala.
First I thought it was an issue with pyodbc, but after submitting the issue on their github we figured out that it was a driver issue. For reference: pyodbc github issue 1197
Simply put, executing query1 and query2 should produce the same results, but it doesn't:
query1 = """SELECT
col1, col2 as col2_orig,
(CASE WHEN col2 IS NOT NULL THEN col2
ELSE 0 END) AS col2_1,
(CASE WHEN col2 IS NULL THEN 0
ELSE col2 END) AS col2_2
FROM sandbox.jk_test"""
query2 = f"""WITH wrapped_table AS ({query1}) SELECT * FROM wrapped_table"""
It results in two different sets:
['col1', 'col2_orig', 'col2_1', 'col2_2']
['1 ', '1 ', '1 ', '1 ']
['2 ', 'None ', 'None ', '0 ']
['col1', 'col2_orig', 'col2_1', 'col2_2']
['1 ', '1 ', '1 ', '1 ']
['2 ', 'None ', '0 ', '0 ']
The 2nd result is the expected one.
* See the github issue for the code used to run these queries.
Running the queries with CData's driver solves the issue, as does connecting using a JDBC connection.
My Environment:
- Python: Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]
- OS: Windows 10 Enterprise 20H2 64-bit
- driver: Cloudera ODBC Driver for Impala 2.06.16.1022
- Windows Script Host Version 5.812 (for vbs script)