Member since
10-10-2017
181
Posts
5
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3608 | 08-23-2021 05:48 PM | |
9427 | 08-19-2021 06:47 PM | |
1053 | 07-30-2021 12:21 AM | |
2191 | 02-07-2020 04:13 AM | |
2237 | 01-09-2020 03:22 PM |
10-04-2021
07:58 PM
Could you provide more details about your OS and Impala ODBC Driver version? I tested Impala ODBC Driver 2.6.13 on Linux. The driver doesn't renew the ticket for each DSN-less connection on my host.
... View more
08-29-2021
08:42 PM
In my opinion, you can start from checking the cached file metadata in Impala. For example, you can run "show table stats" to find out the number of files and the total file size under this table. You don't need to run "compute stats" in advance as Impala gets the information from NN when it loads metadata for this table. Then you can use "hdfs dfs -ls -R" to list all the files under this table directory. By comparing the output from Impala and HDFS, you can confirm if the file metadata in Impala is stale, or if some partition is missing in Impala. The findings in this step will determine what you should investigate next step.
... View more
08-25-2021
10:14 PM
This issue sounds like Impala had stale metadata. But after Impala starts up, it doesn't have any cached metadata so it will load metadata from HMS and HDFS NN immediately (if load_catalog_in_background is true) or when we run a query. So the metadata should be up-to-date. Is it possible that there is some special workflow in your maintenance routine so the tables are updated after Impala loads the metadata? You can check the timestamp of the files in the tables.
... View more
08-23-2021
05:48 PM
2 Kudos
There is a calculation in The Impala Cookbook to estimate the heap memory usage for metadata: • num of tables * 5KB + num of partitions * 2KB + num of files * 750B + num of file blocks * 300B + sum(incremental col stats per table) • Incremental stats For each table, num columns * num partitions * 400B Usually, the insufficient catalog heap memory is caused by a large number of small files or/and partitions. For example, a single 512MB file needs (750B + 4 * 3 * 300B) = 4350B. But if we split this file to 128 4MB files, these files will use (128 * 750B + 128 * 3 * 300B) = 211200B, nearly 50 times!
... View more
08-19-2021
06:47 PM
This is a bug fixed by HIVE-15642. You can work around this issue by setting hive.metastore.dml.events to false.
... View more
08-10-2021
03:21 AM
You need a UDF to parse the XML string and extract the values. As far as I know, Impala doesn't have a native function to do it.
... View more
07-30-2021
12:21 AM
The calculations on constants should be processed in the compiling phase. If you have a look at the query plans as below, you can see that both "months_add(now(), -1)" and "months_add('2020-01-01', -1)" are evaluated already. Impala doesn't need to repeat evaluation for each row. [localhost:21000] default> explain select * from mytable where tran_timestamp > months_add(now(), -1); Query: explain select * from mytable where tran_timestamp > months_add(now(), -1) +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=0B Threads=2 | | Per-Host Resource Estimates: Memory=10MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.mytable | | | | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED] | | | | | 00:SCAN HDFS [default.mytable] | | HDFS partitions=1/1 files=0 size=0B | | predicates: tran_timestamp > TIMESTAMP '2021-06-30 17:08:33.432148000' | | row-size=16B cardinality=0 | +------------------------------------------------------------------------------------+ Fetched 13 row(s) in 0.99s [localhost:21000] default> explain select * from mytable where tran_timestamp > months_add('2020-01-01', -1); Query: explain select * from mytable where tran_timestamp > months_add('2020-01-01', -1) +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=0B Threads=2 | | Per-Host Resource Estimates: Memory=10MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.mytable | | | | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED] | | | | | 00:SCAN HDFS [default.mytable] | | HDFS partitions=1/1 files=0 size=0B | | predicates: tran_timestamp > TIMESTAMP '2019-12-01 00:00:00' | | row-size=16B cardinality=0 | +------------------------------------------------------------------------------------+ Fetched 13 row(s) in 0.02s
... View more
02-07-2020
04:13 AM
If a sqoop job failed/crashed in the middle of importing a table, the table is imported. When you run this job again, it will start from zero so you need to clear the partially imported data first. Alternatively, if you know which rows are not imported yet, you can use the WHERE clause when you restart the job to import the rest rows.
... View more
01-09-2020
03:22 PM
1 Kudo
Impala shares metadata and data with Hive. You can use sqoop to import the tables from your database into hive. Don't forget to run 'invalidate metadata' in Impala after the ingestion is done. Otherwise, you can't see the imported tables.
... View more
01-09-2020
03:08 PM
Sorry for the delay. Here is an example of HAProxy configuration: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_proxy.html#tut_proxy. Please check if you set balance to SOURCE for ODBC proxy. I used to see some client (I can't remember if it's an ODBC or JDBC client) opened a connection and reused an existing session. If the balance is not set to SOURCE, HAProxy could forward the connection to another impala daemon so the client would hit 'Invalid query handle'. You can enable debug level logging for Impala ODBC Driver. After the issue occurs, you can trace the operations in ODBC driver logs. If the client opened a connection prior to "Invalid query handle", the LB could forward the connection to a wrong impala daemon. Then you can look up impalad logs by the timestamp to figure out which impala daemon the client connected to and check if the impala daemon is the coordinator of the query (it can be found from query profile). Otherwise, you need to trace the query id in the impalad coordinator logs to find out why it returned "Invalid query handle" to the client.
... View more