About zegab

zegab · ‎10-15-2024

Hi @AKO , Impala has variable substitution like this: [hostname.local.net:21000] default> SET VAR:query=SELECT 1+2; Variable QUERY set to SELECT 1+2 [hostname.local.net:21000] default> ${VAR:query}; Query: SELECT 1+2 Query submitted at: 2024-10-15 15:54:29 (Coordinator: https://hostname.local.net:25000) Query progress can be monitored at: https://hostname.local.net:25000/query_plan?query_id=nnnn +-------+ | 1 + 2 | +-------+ | 3 | +-------+ Fetched 1 row(s) in 1.15s See official Impala docs at: https://impala.apache.org/docs/build/html/topics/impala_shell_running_commands.html This is a feature of impala-shell, and not impala itself, so depending on what you call "Impala Query Manager", your experience might be different. If you want a solution that is more database independent, then I recommend to use a view or a SELECT CTE (WITH statement) instead: WITH sub_query AS ( SELECT 1+2 ) SELECT * FROM sub_query;

zegab · ‎10-15-2024

Hi @mrblack , how do you know that Impala performs a full table scan?

zegab · ‎10-15-2024

In your where clause: r.key=’street’ AND r.value=’abc’ AND r.key=’phone’ AND r.value=’123’ you are using the "and" operator between all the conditions. That would select a row/record where all of these conditions are true at the same time, but there are no such records. I think that's why you are getting empty results. You should use "OR" between conditions that applies to different rows, like: (r.key=’street’ AND r.value=’abc’) OR (r.key=’phone’ AND r.value=’123’)

zegab · ‎10-15-2024

@Kjarzyna wrote: Yes I saw the documentation, but i didn’t find solution there. In documentation you usually add just one map field and value into where clause Hi @Kjarzyna , If you just add one single map key or value to the where clause, does your query work?

zegab · ‎10-02-2024

I would try if replacing the sub-queries with 'WITH' statements would help. Maybe the query is just too complex for this query-rewrite/parameter substitution engine n the ODBC driver. If that not helps, there are some logging options for the driver, I would use those to see if they give any useful information what is happening inside the driver.

zegab · ‎10-01-2024

Hi @evanle96 ! Could this issue be related to HDFS failover and the HA configuration being affected by the deleted directories? No, I don't think so. Deleting directories should not affect NN failover or HA configuration, unless there is something is fundamentally wrong with your setup or hardware. You might elaborate a bit more on what happened here? How can I validate whether the problem is related to HDFS HA and failover? What you mention in the your last question: Triggering a manual failover and checking if basic read write from CLI works, that should be a good start. Is there a way to force Sqoop/Oozie to properly use the active NameNode instead of the standby? HDFS clients in general should have a list of all NameNodes available to them. If the client gets the above error when connecting, it should try to connect the next available NN. If that's not happening, likely there is some issue with the client's configuration (core-site.xml, hdfs-site.xml). It is possible that it only knows about one NN (which is the standby), or the config is outdated, and pointing to an old, decommissioned host, or it cannot connect due to network issues. Your logs should tell more if the job is actually trying to fail-over to the other NN, so a bit more context around the error message (more logs) would be useful to see what's going on exactly. I have checked the HA configuration, and failover seems to be functioning as the standby takes over when the active NameNode is restarted. However, the error persists when trying to read or write to HDFS. Do you mean the sqoop job fails, or you cannot read/write with simple HDFS CLI commands, no matter what NN is the active?

zegab · ‎10-01-2024

The error message shows that Impala gets the query with question marks in it, which is not good, as Impala itself doesn't supports prepared statements or query parameters. All of this functionality should be done in the ODBC driver. You've written that a simple query without subquery works. Does the simple query works with or without parameter substitution? Since the whole prepared statement/query substitution is don by the ODBC driver, and not by Impala, you would get no performance gains from using it. So I believe this only useful if you are porting some existing code/queries to use Impala. You can just use a python f-string or the .format() function to do the parameter substitution by yourself in your code, it won't hurt performance.

zegab · ‎09-30-2024

@disoardi Parametric queries do only work if the "UseNativeQuery" option is set to 0, this is the default (but you might have it set to 1 in the DSN configuration). Yo could try connecting with: crsr = pyodbc.connect('DSN=impala;UseNativeQuery=0', autocommit=True).cursor() See page 84 of the "Cloudera ODBC Connector for Apache Impala Installation and Configuration Guide" for the full description of this option.

zegab · ‎02-27-2018

Hi csguna, Navigator audit is auditing the Hadoop Master roles only, and the hdfs shell commands are working as a regular HDFS client from the NameNode's perspective. At the namenode side, where HDFS audit logs are generated, is not possible to determine why a client would like to read a file. The only thing that the namenode knows & can log that a client/user would like to open&read a file, but we have no information about what the client will actually do with the data. The client could save the data to a local disk, send it to a network service, simply display the contents of the file, or do an ordinary ETL job and write the results back to HDFS, etc. That is why an "open" operation is logged for both 'hadoop fs -cat size.log' and 'hadoop fs -get size.log'. Therefore with Navigator Audit, this is not currently possible, as the knowledge what the client will do with the data read from HDFS is missing. Usually there are some ways on the OS level itself to audit what users/processes do (like the Linux audit framework), and that can be used to audit file access on the OS level. It might be possible to combine audit data form the OS and Navigator to pinpoint such operations that you mentioned, but I do not know any automated way to do that.

zegab · ‎02-13-2018

Hi, support for device UUIDs with navencrypt was introduced in version 3.13.0: https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_navigator_encrypt_new_features.html#nav_encrypt_313 Please check the relevant documentation on how to use it here: https://www.cloudera.com/documentation/enterprise/latest/topics/navigator_encrypt_prepare.html#concept_device_uuids regards, Gabor Zele Customer Operations Engineer

Online	Offline
Last Visited	‎12-16-2024 10:35 AM

Member Since	‎12-10-2015 08:14 AM
Last Visited	‎12-16-2024 10:35 AM
Posts	27
Kudos received	6

Cloudera Community

Re: SQL SELECT into Variable

Re: Navigator Hdfs - audit

Re: NavEncrypt / Use UUID in ztab?

Re: What are the benefits If I enable Namenoe HA ...

Re: SQL SELECT into Variable

Re: impala forces full table scan

Re: Impala multiple Key values in where clause wit...

Re: Impala multiple Key values in where clause wit...

Re: Impala ODBC driver and python query with param...

Re: Operation category READ is not supported in st...

Re: Impala ODBC driver and python query with param...

Re: Impala ODBC driver and python query with param...

Re: Navigator Hdfs - audit

Re: NavEncrypt / Use UUID in ztab?