Member since
12-10-2015
27
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
34 | 10-15-2024 09:03 AM | |
2418 | 02-27-2018 07:07 AM | |
2497 | 02-13-2018 09:00 AM | |
1353 | 11-21-2016 02:53 PM |
10-15-2024
09:03 AM
Hi @AKO , Impala has variable substitution like this: [hostname.local.net:21000] default> SET VAR:query=SELECT 1+2; Variable QUERY set to SELECT 1+2 [hostname.local.net:21000] default> ${VAR:query}; Query: SELECT 1+2 Query submitted at: 2024-10-15 15:54:29 (Coordinator: https://hostname.local.net:25000) Query progress can be monitored at: https://hostname.local.net:25000/query_plan?query_id=nnnn +-------+ | 1 + 2 | +-------+ | 3 | +-------+ Fetched 1 row(s) in 1.15s See official Impala docs at: https://impala.apache.org/docs/build/html/topics/impala_shell_running_commands.html This is a feature of impala-shell, and not impala itself, so depending on what you call "Impala Query Manager", your experience might be different. If you want a solution that is more database independent, then I recommend to use a view or a SELECT CTE (WITH statement) instead: WITH sub_query AS ( SELECT 1+2 ) SELECT * FROM sub_query;
... View more
10-15-2024
08:33 AM
Hi @mrblack , how do you know that Impala performs a full table scan?
... View more
10-15-2024
07:32 AM
In your where clause: r.key=’street’ AND r.value=’abc’ AND r.key=’phone’ AND r.value=’123’ you are using the "and" operator between all the conditions. That would select a row/record where all of these conditions are true at the same time, but there are no such records. I think that's why you are getting empty results. You should use "OR" between conditions that applies to different rows, like: (r.key=’street’ AND r.value=’abc’) OR (r.key=’phone’ AND r.value=’123’)
... View more
10-15-2024
06:19 AM
@Kjarzyna wrote: Yes I saw the documentation, but i didn’t find solution there. In documentation you usually add just one map field and value into where clause Hi @Kjarzyna , If you just add one single map key or value to the where clause, does your query work?
... View more
10-02-2024
02:43 AM
I would try if replacing the sub-queries with 'WITH' statements would help. Maybe the query is just too complex for this query-rewrite/parameter substitution engine n the ODBC driver. If that not helps, there are some logging options for the driver, I would use those to see if they give any useful information what is happening inside the driver.
... View more
10-01-2024
01:25 PM
1 Kudo
Hi @evanle96 ! Could this issue be related to HDFS failover and the HA configuration being affected by the deleted directories? No, I don't think so. Deleting directories should not affect NN failover or HA configuration, unless there is something is fundamentally wrong with your setup or hardware. You might elaborate a bit more on what happened here? How can I validate whether the problem is related to HDFS HA and failover? What you mention in the your last question: Triggering a manual failover and checking if basic read write from CLI works, that should be a good start. Is there a way to force Sqoop/Oozie to properly use the active NameNode instead of the standby? HDFS clients in general should have a list of all NameNodes available to them. If the client gets the above error when connecting, it should try to connect the next available NN. If that's not happening, likely there is some issue with the client's configuration (core-site.xml, hdfs-site.xml). It is possible that it only knows about one NN (which is the standby), or the config is outdated, and pointing to an old, decommissioned host, or it cannot connect due to network issues. Your logs should tell more if the job is actually trying to fail-over to the other NN, so a bit more context around the error message (more logs) would be useful to see what's going on exactly. I have checked the HA configuration, and failover seems to be functioning as the standby takes over when the active NameNode is restarted. However, the error persists when trying to read or write to HDFS. Do you mean the sqoop job fails, or you cannot read/write with simple HDFS CLI commands, no matter what NN is the active?
... View more
10-01-2024
03:29 AM
1 Kudo
The error message shows that Impala gets the query with question marks in it, which is not good, as Impala itself doesn't supports prepared statements or query parameters. All of this functionality should be done in the ODBC driver. You've written that a simple query without subquery works. Does the simple query works with or without parameter substitution? Since the whole prepared statement/query substitution is don by the ODBC driver, and not by Impala, you would get no performance gains from using it. So I believe this only useful if you are porting some existing code/queries to use Impala. You can just use a python f-string or the .format() function to do the parameter substitution by yourself in your code, it won't hurt performance.
... View more
10-01-2024
03:13 AM
1 Kudo
@Devesh If you are invoking this JAVA code from the CLI, you have to add the various Hadoop config files to the classpath, othervise they won't be loaded: core-site.xml hdfs-site.xml yarn-site.xml GW roles will just make sure these are available on the host, but you have to make sure your code is able to access them.
... View more
09-30-2024
04:51 PM
1 Kudo
@disoardi Parametric queries do only work if the "UseNativeQuery" option is set to 0, this is the default (but you might have it set to 1 in the DSN configuration). Yo could try connecting with: crsr = pyodbc.connect('DSN=impala;UseNativeQuery=0', autocommit=True).cursor() See page 84 of the "Cloudera ODBC Connector for Apache Impala Installation and Configuration Guide" for the full description of this option.
... View more
02-27-2018
07:07 AM
Hi csguna, Navigator audit is auditing the Hadoop Master roles only, and the hdfs shell commands are working as a regular HDFS client from the NameNode's perspective. At the namenode side, where HDFS audit logs are generated, is not possible to determine why a client would like to read a file. The only thing that the namenode knows & can log that a client/user would like to open&read a file, but we have no information about what the client will actually do with the data. The client could save the data to a local disk, send it to a network service, simply display the contents of the file, or do an ordinary ETL job and write the results back to HDFS, etc. That is why an "open" operation is logged for both 'hadoop fs -cat size.log' and 'hadoop fs -get size.log'. Therefore with Navigator Audit, this is not currently possible, as the knowledge what the client will do with the data read from HDFS is missing. Usually there are some ways on the OS level itself to audit what users/processes do (like the Linux audit framework), and that can be used to audit file access on the OS level. It might be possible to combine audit data form the OS and Navigator to pinpoint such operations that you mentioned, but I do not know any automated way to do that.
... View more