Member since
10-10-2017
144
Posts
1
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
250 | 02-07-2020 04:13 AM | |
296 | 01-09-2020 03:22 PM | |
1395 | 12-18-2019 04:05 PM | |
515 | 11-21-2019 06:31 PM |
02-07-2020
04:13 AM
If a sqoop job failed/crashed in the middle of importing a table, the table is imported. When you run this job again, it will start from zero so you need to clear the partially imported data first. Alternatively, if you know which rows are not imported yet, you can use the WHERE clause when you restart the job to import the rest rows.
... View more
01-09-2020
03:22 PM
1 Kudo
Impala shares metadata and data with Hive. You can use sqoop to import the tables from your database into hive. Don't forget to run 'invalidate metadata' in Impala after the ingestion is done. Otherwise, you can't see the imported tables.
... View more
01-09-2020
03:08 PM
Sorry for the delay. Here is an example of HAProxy configuration: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_proxy.html#tut_proxy. Please check if you set balance to SOURCE for ODBC proxy. I used to see some client (I can't remember if it's an ODBC or JDBC client) opened a connection and reused an existing session. If the balance is not set to SOURCE, HAProxy could forward the connection to another impala daemon so the client would hit 'Invalid query handle'. You can enable debug level logging for Impala ODBC Driver. After the issue occurs, you can trace the operations in ODBC driver logs. If the client opened a connection prior to "Invalid query handle", the LB could forward the connection to a wrong impala daemon. Then you can look up impalad logs by the timestamp to figure out which impala daemon the client connected to and check if the impala daemon is the coordinator of the query (it can be found from query profile). Otherwise, you need to trace the query id in the impalad coordinator logs to find out why it returned "Invalid query handle" to the client.
... View more
01-01-2020
05:41 PM
If the query didn't fail immediately after the error "[Cloudera][ODBC] (11180) SQLGetInfo property not found", we can ignore this error message. Or you can try UseNativeQuery option to avoid this kind of messages. From the point of my view, " Invalid query handle" was most likely the cause. If I were you, I would check if the query or session timed out or cancelled. If there is a load balancer, I'd also check if the LB sent the request to a wrong impala daemon by chance.
... View more
12-18-2019
04:05 PM
It is a common error message when the client couldn't receive an expected Kerberos/SSL handshaking packet from the peer. I'd suggest you check the TCP connectivity before the client and the server. It can be done easily by running command "telnet <server_host> <server_port". If it looks good, please double check the authentication settings on both sides. If this issue happens intermittently, it might be caused by the load on the server.
... View more
12-16-2019
12:49 PM
The error was "Password verification failed". You set SSLKeyStore to a truststore.jks file but it should be a keystore.jks file. Please change this property and try again.
... View more
12-15-2019
08:00 PM
You mentioned that you could connect to Impala without user and password. I suppose you didn't change the authentication mechanism but only enabled SSL for Impala service. Then what you need to do is to add SSL, SSLKeyStore and SSLKeyStorePwd into your JDBC connection string: jdbc:impala://Impala_Daemon_Host:21050/default;SSL=1;SSLKeyStore=<keystore.jks>;SSLKeyStorePwd=<keystore_pwd> You don't need UID and PWD unless you have to set AuthMech to 3 which is usually required when Impala uses LDAP authentication.
... View more
11-28-2019
04:23 PM
Hi Miguel, First of all, you missed ORDER BY clause for the rank() function. If you add the same ORDER BY clause for both rank() and sum(count(0)) functions, I think the answer is YES. You can run explain to review the query plan. I did it on my cluster. What I see is that both rank() and sum() are in a single ANALYTIC node. It means the analytic functions do reuse the partitions. | 03:ANALYTIC |
| | functions: rank(), sum(count(*)) |
| | partition by: x, y |
| | order by: x ASC, y ASC |
| | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW But if you only add ORDER BY clause for rank() function, you'll see two ANALYTIC nodes: | 04:ANALYTIC |
| | functions: sum(count(*)) |
| | partition by: x, y |
| | |
| 03:ANALYTIC |
| | functions: rank() |
| | partition by: x, y |
| | order by: x ASC, y ASC |
| | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | Regards, Robbie
... View more
11-27-2019
05:09 PM
Hi Amn, The message was " 1 errors(s) seen monitoring completed queries". It was reported by CM agent. You should be able to find error messages in detail from cloudera-scm-agent.log*. The ShortCircuitCache message was from HDFS API. You can ignore it as it's a warning message. HDFS API should retry and bypass ShortCircuitCache if ShortCircuitCache fails. Regards, Robbie
... View more
11-24-2019
03:58 PM
Hi Alexis, I'm not sure if I get your point correctly. I guess you want to test Impala query performance. Usually, we can redirect the output to /dev/null or a local file but the network and disk IO throughput will affect the test results so you want the coordinator executes the query but not send the result back to the client. Is it correct? As I know, there isn't such a feature in Impala coordinator to discard the rows when they are ready to be fetched by the client. You can try the query "CREATE TABLE new_table AS original_select_query". This query does the same as the original SELECT query does but save the result in a new HDFS table instead of sending back to the client. In this way, you can get a very close performance assessment of a SELECT query. Regards, Robbie
... View more
11-21-2019
06:51 PM
Hi Amn, Your issue is different from which is addressed in the KB article. CM agent collects archived query profiles from Impala Daemon and saves to CM database periodically. According to your logs, the file /var/log/impalad/profiles/impala_profile_log_1.1-1574308887326 was gone when CM agent tried to open it: [21/Nov/2019 17:39:58 +0000] 5415 Monitor-HostMonitor throttling_logger ERROR (2 skipped) Failed to collect NTP metrics
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.2-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 48, in collect
self.collect_ntpd()
...
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file.py", line 41, in open_log_file
handle = open(self.__path)
IOError: [Errno 2] No such file or directory: '/var/log/impalad/profiles/impala_profile_log_1.1-1574308887326' I suggest you investigate why the file was gone. How often do you see this kind of error?
... View more
11-21-2019
06:31 PM
I'd suggest you use a backslash to escape special characters. Please try this query: select * from hi_problem_view
where
state like 'Won\'t fix'
... View more
11-19-2019
06:27 PM
If you have a support contract with Cloudera, you can contact us via details at the following link under ‘How can I contact support?’ and we will be happy to assist: https://my.cloudera.com/faq.html#support If you do not yet have a support contract with Cloudera, you would need to first speak with a sales representative in order to contract with support via the following link: https://www.cloudera.com/contact-sales.html
... View more
11-15-2019
04:07 PM
Usually, I would inspect the profiles of failed queries to get rough ideas, then determine which logs I should check next. If you have a contract with Cloudera, it's worth to file a ticket. Cloudera Support will help you troubleshoot step by step.
... View more
11-15-2019
01:41 PM
Several reasons can cause this error. Check your ODBC settings and application/ODBC driver versions. For example, if one side is keberized/SSL enabled but another side isn't, or a 32bit application runs against 64bit ODBC driver, you can see this kind of error. If all look good, you need to check Impala ODBC Driver logs to find more details.
... View more
11-14-2019
03:51 PM
You are welcome. The query was cancelled due to some exception but there are no details of the exception in your query info. You can download the text query profile from CM. If you still can't see the detail in the query profile, you need to grep the query id 3348f74b129b0dae:1666447600000000 from the impala INFO log files on p1i-hdp-srv11.lnt.com. You should be able to see which query instance hit the exception. Then you can grep the instance id from the impala INFO log files on the host where the instance was running to look for the cause.
... View more
11-14-2019
01:26 PM
The first error was that the function connect() failed because the peer impala daemon didn't accept the connection in time. You can have a look at charts on CM to check the CPU usage and number of threads in the impala daemon on the peer node for that time. Similar to "Resource temporarily unavailable", this error could also be related to CPU load or thread resource limits. The second error means the connection was lost. You review impala daemon logs on p1i-hdp-srv07.lnt.com to look for the reason .
... View more
11-13-2019
06:26 PM
Actually, I can see two different types of error from your error messages. " Connection refused" usually means the port 22000 was not open on the peer node. I'd like to check if the impala daemon on the peer node (<node_address>) stopped at that time. The "Resource temporarily unavailable" error was most likely related to the thread resource limits. The impala daemon couldn't create a thread due to insufficient resource so threw this error. I suggest have a look at IMPALA-5605 which should be helpful. [1] https://github.com/apache/impala/blob/53ef115e8e5cac231ef948f8670106c348d197fe/be/src/util/thread.cc#L319
... View more
06-05-2019
02:17 AM
The "num_threads_per_disk" is the number of IO threads per disk. By default, there should be 1 IO thread per rotational disk or 8 IO threads per solid disk. According to the code, we only need to consider setting num_threads_per_disk where the CPU doesn't support SSE4.2 (to be honest I have no idea about SSE4.2). The "num_threads_per_core" is the number of scanner threads. From the point of my view, we only need to increase it when we see the IO threads are waiting, for example, RowBatchQueuePutWaitTime is high while RowBatchQueueGetWaitTime and NetworkSendTime are low. Please correct me if I'm wrong.
... View more
05-09-2019
09:49 PM
FYI, you can create a trigger on CM: 1) Go to CM -> Clusters -> Impala service -> Status; 2) Click the button "Create Trigger"; 3) Enter the name of the new trigger; 4) Enter "impala_catalogserver_jvm_heap_current_usage_bytes" in the Metric box; 5) Choose METRIC CONDITIONS, ACTION, Metric Evaluation Window etc; 6) Click the button "Create Trigger" to save it. Then the action will be triggered if the condition is met. For example, you can see an alert if the current JVM usage is higher than the threshold. I hope it can help you monitor the memory usage.
... View more