Support Questions

Find answers, ask questions, and share your expertise

Impala ODBC driver and python query with params

avatar
Explorer

Hi folks,

 

I tried to submit a query with params via ODBC to Impala. Thi is my code:

 

 

 

import pyodbc

# connect
crsr = pyodbc.connect('DSN=impala', autocommit=True).cursor()
param1 = 1000000000
param2 = '2023-01-01'

# long query
query = """
select *
from target_test.test d 
left outer join 
  (select cli, fsc, nom_rag_soc, dat_ini_val, dat_ccz_inl, dat_csz_efv, dta_cessata, sta, prg_var, cod_cau_csz_dl, rn 
  from 
    (select 
      c2.cod_cli as cli, 
      trim(c2.cod_fsc) as fsc,
      trim(c2.nom_rag_soc) as nom_rag_soc,
      c2.dat_ini_val,
      dat_ccz_inl,
      dat_csz_efv, 
      (case when c2.dat_csz_efv= '1900-01-01 00:00:00' then '2999-01-01 00:00:00' else c2.dat_csz_efv end) as dta_cessata,
      sta,
      prg_var,
      cod_cau_csz_dl,
      row_number() over (partition by cod_fsc order by (case when c2.dat_csz_efv='1900-01-01 00:00:00' then '2999-01-01 00:00:00' else c2.dat_csz_efv end) desc) as rn
    from target_test.test2 c2 
    where  
      c2.sta='c' 
      and c2.dat_ini_val = (select max(c1.dat_ini_val) from target_test.test3 c1 where c1.sta='c' and c2.cod_cli=c1.cod_cli and c2.cod_fsc=c1.cod_fsc) 
      and prg_var = (select max(prg_var) from target_test.v_padln_t_cliente xc where c2.cod_cli = xc.cod_cli and sta ='c' and xc.dat_ini_val = c2.dat_ini_val) 
    ) a 
    where rn=1 
  ) k 
on trim(d.cod_fsc)=k.fsc
  where 
    d.prg_rig < ?
    and d.dat_cre_flu >= ?
    and d.tip_flu = 'f24';
"""

try:
  res = crsr.execute(query, param1, param2).fetchall
  print(res)
except pyodbc.Error as e:
  e.args = (e.args[1],)
  raise

 

 

 

But this is the output:

disoardi_0-1727356242556.png

I installed this driver: ClouderaImpalaODBC-2.7.1.1009-1.x86_64.rpm

If I try whit simple query, without subqueries, it works.

 

Any suggestions?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi-

If you are using the latest ODBC and seeing this issue, then this could be hitting an ODBC driver 'corner case' bug. Please open a ticket with Cloudera Support, and we will take it further to the Driver Team.

View solution in original post

10 REPLIES 10

avatar
Community Manager

@disoardi, Welcome to our community! To help you get the best possible answer, I have tagged our Impala experts @ChethanYM @SVB @mszurap  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Explorer

Hi @VidyaSargur and thanks for your suggestion.

@ChethanYM, @SVB or @mszurap any ideas about this issue?

 

Thanks in advance.

avatar
Rising Star

@disoardi Parametric queries do only work if the "UseNativeQuery" option is set to 0, this is the default (but you might have it set to 1 in the DSN configuration).

Yo could try connecting with:

crsr = pyodbc.connect('DSN=impala;UseNativeQuery=0', autocommit=True).cursor()

See page 84 of the "Cloudera ODBC Connector for Apache Impala Installation and Configuration Guide" for the full description of this option. 

avatar
Explorer

I tried but the problem remains the same.

Thanks for the tip

avatar
Rising Star

The error message shows that Impala gets the query with question marks in it, which is not good, as Impala itself doesn't supports prepared statements or query parameters. All of this functionality should be done in the ODBC driver.

You've written that a simple query without subquery works. Does the simple query works with or without parameter substitution?

Since the whole prepared statement/query substitution is don by the ODBC driver, and not by Impala, you would get no performance gains from using it. So I believe this only useful if you are porting some existing code/queries to use Impala.
You can just use a python f-string or the .format() function to do the parameter substitution by yourself in your code, it won't hurt performance.

avatar
Explorer

The simple query, without subqueries, with params works.

For example:

SELECT * FROM target_test.test2 where id = ?;

 About the support of params in the ODBC driver for impala I found this: https://community.cloudera.com/t5/Support-Questions/Unable-to-pass-parameter-to-Impala-from-SSRS/m-p...

Quickly the ODBC driver for impala supports the standard SQLBindParameter() API to subsitute ? in the query text with parameters.

but this seems not to work for complex queries.

 

avatar
Rising Star

I would try if replacing the sub-queries with 'WITH' statements would help. Maybe the query is just too complex for this query-rewrite/parameter substitution engine n the ODBC driver.

If that not helps, there are some logging options for the driver, I would use those to see if they give any useful information what is happening inside the driver.

avatar
Explorer

I tried but not works. The complet test below:

import pyodbc

# connect
crsr = pyodbc.connect('DSN=impala1;', autocommit=True).cursor()

# unnamed parameters
params = ('C', 1000000000, '2023-01-01')

# short query
query = """SELECT *
        FROM test1.test1 d 
        INNER JOIN 
            (SELECT * 
            FROM test1.test2 c1
            WHERE dat_ini_val = (SELECT max(c2.dat_ini_val) 
                                FROM test1.test2 c2
                                WHERE c2.sta = ? 
                                AND c2.cod_cli=c1.cod_cli 
                                AND c1.cod_fsc=c2.cod_fsc)) c 
        ON d.cod_cli = c.cod_cli
        WHERE cast(D.cod_fsc as string) <> ''  
        AND d.prg_rig < ? AND d.dat_cre_flu >= ? AND d.TIP_FLU = 'F24';"""

try:
    res = crsr.execute(query, params).fetchall() #<-- this works
    print('Query simple DONE!!!') 
except pyodbc.Error as e:
    e.args = (e.args[1],)
    print(e)

# long query
query = """SELECT *
FROM test1.test1 d 
LEFT OUTER JOIN 
    (SELECT cli, fsc, nom_rag_soc, dat_ini_val, dat_ccz_inl, Dat_csz_efv, sta, prg_var, COD_CAU_CSZ_DL
     FROM 
        (SELECT c2.cod_cli AS cli, 
                TRIM(CAST(c2.cod_fsc AS STRING)) AS fsc,
                TRIM(CAST(c2.nom_rag_soc AS STRING)) AS nom_rag_soc, 
                c2.dat_ini_val, 
                c2.dat_ccz_inl, 
                c2.Dat_csz_efv,
                c2.sta, 
                c2.prg_var, 
                c2.COD_CAU_CSZ_DL
                ,(CASE WHEN c2.Dat_csz_efv = '1900-01-01 00:00:00' THEN '2999-01-01' ELSE c2.Dat_csz_efv END) AS dta_cessata
                ,ROW_NUMBER () OVER (PARTITION BY cod_fsc ORDER BY (case when c2.Dat_csz_efv='1900-01-01 00:00:00' then '2999-01-01' ELSE c2.Dat_csz_efv END) desc) rn
         FROM test1.test2 c2 
         WHERE c2.sta = ?
        ) A
    ) k 
ON TRIM(CAST(d.COD_fsc AS STRING)) = k.fsc
WHERE d.prg_rig < ? 
  AND d.dat_cre_flu >= ? 
  AND d.TIP_FLU = 'F24';"""


try:
    res = crsr.execute(query, params).fetchall() #<-- this does not work
    print('Query complex DONE!!!')
except pyodbc.Error as e:
    e.args = (e.args[1],)
    print(e)


query = """
WITH prova as (SELECT c2.cod_cli AS cli, 
                TRIM(CAST(c2.cod_fsc AS STRING)) AS fsc,
                TRIM(CAST(c2.nom_rag_soc AS STRING)) AS nom_rag_soc, 
                c2.dat_ini_val, 
                c2.dat_ccz_inl, 
                c2.Dat_csz_efv,
                c2.sta, 
                c2.prg_var, 
                c2.COD_CAU_CSZ_DL
                ,(CASE WHEN c2.Dat_csz_efv = '1900-01-01 00:00:00' THEN '2999-01-01' ELSE c2.Dat_csz_efv END) AS dta_cessata
                ,ROW_NUMBER () OVER (PARTITION BY cod_fsc ORDER BY (case when c2.Dat_csz_efv='1900-01-01 00:00:00' then '2999-01-01' ELSE c2.Dat_csz_efv END) desc) rn
         FROM test1.test2 c2 
         WHERE c2.sta = 'C'
)

SELECT *
FROM test1.test1 d 
LEFT OUTER JOIN 
    (SELECT cli, fsc, nom_rag_soc, dat_ini_val, dat_ccz_inl, Dat_csz_efv, sta, prg_var, COD_CAU_CSZ_DL
     FROM prova
    ) k 
ON TRIM(CAST(d.COD_fsc AS STRING)) = k.fsc
WHERE d.prg_rig < ? 
  AND d.dat_cre_flu >= ? 
  AND d.TIP_FLU = 'F24';
"""

params = (1000000000, '2023-01-01')

try:
    res = crsr.execute(query, params).fetchall() #<-- this does not work
    print('Query with DONE!!!')
except pyodbc.Error as e:
    e.args = (e.args[1],)
    print(e)

 

 

the simple query, with parameters, works, others do not

 

 

avatar
Expert Contributor

Hi-

If you are using the latest ODBC and seeing this issue, then this could be hitting an ODBC driver 'corner case' bug. Please open a ticket with Cloudera Support, and we will take it further to the Driver Team.