About zegab

zegab · ‎10-15-2024

Hi @AKO , Impala has variable substitution like this: [hostname.local.net:21000] default> SET VAR:query=SELECT 1+2; Variable QUERY set to SELECT 1+2 [hostname.local.net:21000] default> ${VAR:query}; Query: SELECT 1+2 Query submitted at: 2024-10-15 15:54:29 (Coordinator: https://hostname.local.net:25000) Query progress can be monitored at: https://hostname.local.net:25000/query_plan?query_id=nnnn +-------+ | 1 + 2 | +-------+ | 3 | +-------+ Fetched 1 row(s) in 1.15s See official Impala docs at: https://impala.apache.org/docs/build/html/topics/impala_shell_running_commands.html This is a feature of impala-shell, and not impala itself, so depending on what you call "Impala Query Manager", your experience might be different. If you want a solution that is more database independent, then I recommend to use a view or a SELECT CTE (WITH statement) instead: WITH sub_query AS ( SELECT 1+2 ) SELECT * FROM sub_query;

zegab · ‎10-15-2024

Hi @mrblack , how do you know that Impala performs a full table scan?

zegab · ‎10-15-2024

In your where clause: r.key=’street’ AND r.value=’abc’ AND r.key=’phone’ AND r.value=’123’ you are using the "and" operator between all the conditions. That would select a row/record where all of these conditions are true at the same time, but there are no such records. I think that's why you are getting empty results. You should use "OR" between conditions that applies to different rows, like: (r.key=’street’ AND r.value=’abc’) OR (r.key=’phone’ AND r.value=’123’)

zegab · ‎10-15-2024

@Kjarzyna wrote: Yes I saw the documentation, but i didn’t find solution there. In documentation you usually add just one map field and value into where clause Hi @Kjarzyna , If you just add one single map key or value to the where clause, does your query work?

zegab · ‎10-02-2024

I would try if replacing the sub-queries with 'WITH' statements would help. Maybe the query is just too complex for this query-rewrite/parameter substitution engine n the ODBC driver. If that not helps, there are some logging options for the driver, I would use those to see if they give any useful information what is happening inside the driver.

zegab · ‎10-01-2024

Hi @evanle96 ! Could this issue be related to HDFS failover and the HA configuration being affected by the deleted directories? No, I don't think so. Deleting directories should not affect NN failover or HA configuration, unless there is something is fundamentally wrong with your setup or hardware. You might elaborate a bit more on what happened here? How can I validate whether the problem is related to HDFS HA and failover? What you mention in the your last question: Triggering a manual failover and checking if basic read write from CLI works, that should be a good start. Is there a way to force Sqoop/Oozie to properly use the active NameNode instead of the standby? HDFS clients in general should have a list of all NameNodes available to them. If the client gets the above error when connecting, it should try to connect the next available NN. If that's not happening, likely there is some issue with the client's configuration (core-site.xml, hdfs-site.xml). It is possible that it only knows about one NN (which is the standby), or the config is outdated, and pointing to an old, decommissioned host, or it cannot connect due to network issues. Your logs should tell more if the job is actually trying to fail-over to the other NN, so a bit more context around the error message (more logs) would be useful to see what's going on exactly. I have checked the HA configuration, and failover seems to be functioning as the standby takes over when the active NameNode is restarted. However, the error persists when trying to read or write to HDFS. Do you mean the sqoop job fails, or you cannot read/write with simple HDFS CLI commands, no matter what NN is the active?

zegab · ‎10-01-2024

The error message shows that Impala gets the query with question marks in it, which is not good, as Impala itself doesn't supports prepared statements or query parameters. All of this functionality should be done in the ODBC driver. You've written that a simple query without subquery works. Does the simple query works with or without parameter substitution? Since the whole prepared statement/query substitution is don by the ODBC driver, and not by Impala, you would get no performance gains from using it. So I believe this only useful if you are porting some existing code/queries to use Impala. You can just use a python f-string or the .format() function to do the parameter substitution by yourself in your code, it won't hurt performance.

zegab · ‎09-30-2024

@disoardi Parametric queries do only work if the "UseNativeQuery" option is set to 0, this is the default (but you might have it set to 1 in the DSN configuration). Yo could try connecting with: crsr = pyodbc.connect('DSN=impala;UseNativeQuery=0', autocommit=True).cursor() See page 84 of the "Cloudera ODBC Connector for Apache Impala Installation and Configuration Guide" for the full description of this option.

Online	Offline
Last Visited	‎12-16-2024 10:35 AM

Member Since	‎12-10-2015 08:14 AM
Last Visited	‎12-16-2024 10:35 AM
Posts	27
Kudos received	6

Cloudera Community

Re: SQL SELECT into Variable

Re: SQL SELECT into Variable

Re: impala forces full table scan

Re: Impala multiple Key values in where clause wit...

Re: Impala multiple Key values in where clause wit...

Re: Impala ODBC driver and python query with param...

Re: Operation category READ is not supported in st...

Re: Impala ODBC driver and python query with param...

Re: Impala ODBC driver and python query with param...