Member since
10-10-2017
181
Posts
3
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1007 | 08-23-2021 05:48 PM | |
4876 | 08-19-2021 06:47 PM | |
346 | 07-30-2021 12:21 AM | |
1141 | 02-07-2020 04:13 AM | |
1187 | 01-09-2020 03:22 PM |
10-04-2021
07:58 PM
Could you provide more details about your OS and Impala ODBC Driver version? I tested Impala ODBC Driver 2.6.13 on Linux. The driver doesn't renew the ticket for each DSN-less connection on my host.
... View more
08-29-2021
08:42 PM
In my opinion, you can start from checking the cached file metadata in Impala. For example, you can run "show table stats" to find out the number of files and the total file size under this table. You don't need to run "compute stats" in advance as Impala gets the information from NN when it loads metadata for this table. Then you can use "hdfs dfs -ls -R" to list all the files under this table directory. By comparing the output from Impala and HDFS, you can confirm if the file metadata in Impala is stale, or if some partition is missing in Impala. The findings in this step will determine what you should investigate next step.
... View more
08-25-2021
10:14 PM
This issue sounds like Impala had stale metadata. But after Impala starts up, it doesn't have any cached metadata so it will load metadata from HMS and HDFS NN immediately (if load_catalog_in_background is true) or when we run a query. So the metadata should be up-to-date. Is it possible that there is some special workflow in your maintenance routine so the tables are updated after Impala loads the metadata? You can check the timestamp of the files in the tables.
... View more
08-23-2021
05:48 PM
1 Kudo
There is a calculation in The Impala Cookbook to estimate the heap memory usage for metadata: • num of tables * 5KB + num of partitions * 2KB + num of files * 750B + num of file blocks * 300B + sum(incremental col stats per table) • Incremental stats For each table, num columns * num partitions * 400B Usually, the insufficient catalog heap memory is caused by a large number of small files or/and partitions. For example, a single 512MB file needs (750B + 4 * 3 * 300B) = 4350B. But if we split this file to 128 4MB files, these files will use (128 * 750B + 128 * 3 * 300B) = 211200B, nearly 50 times!
... View more
08-19-2021
06:47 PM
This is a bug fixed by HIVE-15642. You can work around this issue by setting hive.metastore.dml.events to false.
... View more
08-10-2021
03:21 AM
You need a UDF to parse the XML string and extract the values. As far as I know, Impala doesn't have a native function to do it.
... View more
07-30-2021
12:21 AM
The calculations on constants should be processed in the compiling phase. If you have a look at the query plans as below, you can see that both "months_add(now(), -1)" and "months_add('2020-01-01', -1)" are evaluated already. Impala doesn't need to repeat evaluation for each row. [localhost:21000] default> explain select * from mytable where tran_timestamp > months_add(now(), -1); Query: explain select * from mytable where tran_timestamp > months_add(now(), -1) +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=0B Threads=2 | | Per-Host Resource Estimates: Memory=10MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.mytable | | | | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED] | | | | | 00:SCAN HDFS [default.mytable] | | HDFS partitions=1/1 files=0 size=0B | | predicates: tran_timestamp > TIMESTAMP '2021-06-30 17:08:33.432148000' | | row-size=16B cardinality=0 | +------------------------------------------------------------------------------------+ Fetched 13 row(s) in 0.99s [localhost:21000] default> explain select * from mytable where tran_timestamp > months_add('2020-01-01', -1); Query: explain select * from mytable where tran_timestamp > months_add('2020-01-01', -1) +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=0B Threads=2 | | Per-Host Resource Estimates: Memory=10MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.mytable | | | | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED] | | | | | 00:SCAN HDFS [default.mytable] | | HDFS partitions=1/1 files=0 size=0B | | predicates: tran_timestamp > TIMESTAMP '2019-12-01 00:00:00' | | row-size=16B cardinality=0 | +------------------------------------------------------------------------------------+ Fetched 13 row(s) in 0.02s
... View more
02-07-2020
04:13 AM
If a sqoop job failed/crashed in the middle of importing a table, the table is imported. When you run this job again, it will start from zero so you need to clear the partially imported data first. Alternatively, if you know which rows are not imported yet, you can use the WHERE clause when you restart the job to import the rest rows.
... View more
01-09-2020
03:22 PM
1 Kudo
Impala shares metadata and data with Hive. You can use sqoop to import the tables from your database into hive. Don't forget to run 'invalidate metadata' in Impala after the ingestion is done. Otherwise, you can't see the imported tables.
... View more
01-09-2020
03:08 PM
Sorry for the delay. Here is an example of HAProxy configuration: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_proxy.html#tut_proxy. Please check if you set balance to SOURCE for ODBC proxy. I used to see some client (I can't remember if it's an ODBC or JDBC client) opened a connection and reused an existing session. If the balance is not set to SOURCE, HAProxy could forward the connection to another impala daemon so the client would hit 'Invalid query handle'. You can enable debug level logging for Impala ODBC Driver. After the issue occurs, you can trace the operations in ODBC driver logs. If the client opened a connection prior to "Invalid query handle", the LB could forward the connection to a wrong impala daemon. Then you can look up impalad logs by the timestamp to figure out which impala daemon the client connected to and check if the impala daemon is the coordinator of the query (it can be found from query profile). Otherwise, you need to trace the query id in the impalad coordinator logs to find out why it returned "Invalid query handle" to the client.
... View more
01-01-2020
05:41 PM
If the query didn't fail immediately after the error "[Cloudera][ODBC] (11180) SQLGetInfo property not found", we can ignore this error message. Or you can try UseNativeQuery option to avoid this kind of messages. From the point of my view, " Invalid query handle" was most likely the cause. If I were you, I would check if the query or session timed out or cancelled. If there is a load balancer, I'd also check if the LB sent the request to a wrong impala daemon by chance.
... View more
12-18-2019
04:05 PM
It is a common error message when the client couldn't receive an expected Kerberos/SSL handshaking packet from the peer. I'd suggest you check the TCP connectivity before the client and the server. It can be done easily by running command "telnet <server_host> <server_port". If it looks good, please double check the authentication settings on both sides. If this issue happens intermittently, it might be caused by the load on the server.
... View more
12-16-2019
12:49 PM
The error was "Password verification failed". You set SSLKeyStore to a truststore.jks file but it should be a keystore.jks file. Please change this property and try again.
... View more
12-15-2019
08:00 PM
You mentioned that you could connect to Impala without user and password. I suppose you didn't change the authentication mechanism but only enabled SSL for Impala service. Then what you need to do is to add SSL, SSLKeyStore and SSLKeyStorePwd into your JDBC connection string: jdbc:impala://Impala_Daemon_Host:21050/default;SSL=1;SSLKeyStore=<keystore.jks>;SSLKeyStorePwd=<keystore_pwd> You don't need UID and PWD unless you have to set AuthMech to 3 which is usually required when Impala uses LDAP authentication.
... View more
11-28-2019
04:23 PM
Hi Miguel, First of all, you missed ORDER BY clause for the rank() function. If you add the same ORDER BY clause for both rank() and sum(count(0)) functions, I think the answer is YES. You can run explain to review the query plan. I did it on my cluster. What I see is that both rank() and sum() are in a single ANALYTIC node. It means the analytic functions do reuse the partitions. | 03:ANALYTIC |
| | functions: rank(), sum(count(*)) |
| | partition by: x, y |
| | order by: x ASC, y ASC |
| | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW But if you only add ORDER BY clause for rank() function, you'll see two ANALYTIC nodes: | 04:ANALYTIC |
| | functions: sum(count(*)) |
| | partition by: x, y |
| | |
| 03:ANALYTIC |
| | functions: rank() |
| | partition by: x, y |
| | order by: x ASC, y ASC |
| | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | Regards, Robbie
... View more
11-27-2019
05:09 PM
Hi Amn, The message was " 1 errors(s) seen monitoring completed queries". It was reported by CM agent. You should be able to find error messages in detail from cloudera-scm-agent.log*. The ShortCircuitCache message was from HDFS API. You can ignore it as it's a warning message. HDFS API should retry and bypass ShortCircuitCache if ShortCircuitCache fails. Regards, Robbie
... View more
11-24-2019
03:58 PM
Hi Alexis, I'm not sure if I get your point correctly. I guess you want to test Impala query performance. Usually, we can redirect the output to /dev/null or a local file but the network and disk IO throughput will affect the test results so you want the coordinator executes the query but not send the result back to the client. Is it correct? As I know, there isn't such a feature in Impala coordinator to discard the rows when they are ready to be fetched by the client. You can try the query "CREATE TABLE new_table AS original_select_query". This query does the same as the original SELECT query does but save the result in a new HDFS table instead of sending back to the client. In this way, you can get a very close performance assessment of a SELECT query. Regards, Robbie
... View more
11-21-2019
06:51 PM
Hi Amn, Your issue is different from which is addressed in the KB article. CM agent collects archived query profiles from Impala Daemon and saves to CM database periodically. According to your logs, the file /var/log/impalad/profiles/impala_profile_log_1.1-1574308887326 was gone when CM agent tried to open it: [21/Nov/2019 17:39:58 +0000] 5415 Monitor-HostMonitor throttling_logger ERROR (2 skipped) Failed to collect NTP metrics
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 48, in collect
self.collect_ntpd()
...
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file.py", line 41, in open_log_file
handle = open(self.__path)
IOError: [Errno 2] No such file or directory: '/var/log/impalad/profiles/impala_profile_log_1.1-1574308887326' I suggest you investigate why the file was gone. How often do you see this kind of error?
... View more
11-21-2019
06:31 PM
I'd suggest you use a backslash to escape special characters. Please try this query: select * from hi_problem_view
where
state like 'Won\'t fix'
... View more
11-19-2019
06:27 PM
If you have a support contract with Cloudera, you can contact us via details at the following link under ‘How can I contact support?’ and we will be happy to assist: https://my.cloudera.com/faq.html#support If you do not yet have a support contract with Cloudera, you would need to first speak with a sales representative in order to contract with support via the following link: https://www.cloudera.com/contact-sales.html
... View more
11-15-2019
04:07 PM
Usually, I would inspect the profiles of failed queries to get rough ideas, then determine which logs I should check next. If you have a contract with Cloudera, it's worth to file a ticket. Cloudera Support will help you troubleshoot step by step.
... View more
11-15-2019
01:41 PM
Several reasons can cause this error. Check your ODBC settings and application/ODBC driver versions. For example, if one side is keberized/SSL enabled but another side isn't, or a 32bit application runs against 64bit ODBC driver, you can see this kind of error. If all look good, you need to check Impala ODBC Driver logs to find more details.
... View more
11-14-2019
03:51 PM
You are welcome. The query was cancelled due to some exception but there are no details of the exception in your query info. You can download the text query profile from CM. If you still can't see the detail in the query profile, you need to grep the query id 3348f74b129b0dae:1666447600000000 from the impala INFO log files on p1i-hdp-srv11.lnt.com. You should be able to see which query instance hit the exception. Then you can grep the instance id from the impala INFO log files on the host where the instance was running to look for the cause.
... View more
11-14-2019
01:26 PM
The first error was that the function connect() failed because the peer impala daemon didn't accept the connection in time. You can have a look at charts on CM to check the CPU usage and number of threads in the impala daemon on the peer node for that time. Similar to "Resource temporarily unavailable", this error could also be related to CPU load or thread resource limits. The second error means the connection was lost. You review impala daemon logs on p1i-hdp-srv07.lnt.com to look for the reason .
... View more
11-13-2019
06:26 PM
Actually, I can see two different types of error from your error messages. " Connection refused" usually means the port 22000 was not open on the peer node. I'd like to check if the impala daemon on the peer node (<node_address>) stopped at that time. The "Resource temporarily unavailable" error was most likely related to the thread resource limits. The impala daemon couldn't create a thread due to insufficient resource so threw this error. I suggest have a look at IMPALA-5605 which should be helpful. [1] https://github.com/apache/impala/blob/53ef115e8e5cac231ef948f8670106c348d197fe/be/src/util/thread.cc#L319
... View more
06-05-2019
02:17 AM
The "num_threads_per_disk" is the number of IO threads per disk. By default, there should be 1 IO thread per rotational disk or 8 IO threads per solid disk. According to the code, we only need to consider setting num_threads_per_disk where the CPU doesn't support SSE4.2 (to be honest I have no idea about SSE4.2). The "num_threads_per_core" is the number of scanner threads. From the point of my view, we only need to increase it when we see the IO threads are waiting, for example, RowBatchQueuePutWaitTime is high while RowBatchQueueGetWaitTime and NetworkSendTime are low. Please correct me if I'm wrong.
... View more
05-09-2019
09:49 PM
FYI, you can create a trigger on CM: 1) Go to CM -> Clusters -> Impala service -> Status; 2) Click the button "Create Trigger"; 3) Enter the name of the new trigger; 4) Enter "impala_catalogserver_jvm_heap_current_usage_bytes" in the Metric box; 5) Choose METRIC CONDITIONS, ACTION, Metric Evaluation Window etc; 6) Click the button "Create Trigger" to save it. Then the action will be triggered if the condition is met. For example, you can see an alert if the current JVM usage is higher than the threshold. I hope it can help you monitor the memory usage.
... View more