Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 7638 | 12-18-2020 01:46 PM | |
| 4994 | 12-16-2020 12:11 PM | |
| 3804 | 12-07-2020 01:47 PM | |
| 2479 | 12-07-2020 09:21 AM | |
| 1621 | 10-14-2020 11:15 AM |
12-13-2017
02:59 PM
1 Kudo
I'd suggest looking at the Impala->Queries page in Cloudera manager to see which queries failed. You can filter the queries by selecting "failed or cancelled" in the drop-down next to "Search" or, equivalently, use this query in the search box: query_state = EXCEPTION The alert doesn't distinguish between causes of failure so it could be something innocent (a user is trying to develop a query and getting a lot of syntax errors) or it could be a sign of a bigger problem.
... View more
12-08-2017
12:03 PM
Hi @Plop564 > If an error like that occurs, does the corresponding Impalad could crash ? If an Impalad crash owing to an UDF, restarting it will be enough to go back to good health ? Yes and yes. >What about the Impalad isolation ? Again if a segmentation fault, memory leaks or race conditions occurs, does other cloudera services instances can be affected ? (HDFS, Hive, ...) It won't affect services outside of Impala - the crash is isolated to the Impala process > Could you please quickly summarize the associated risks with buggy C++ UDF ? You already mentioned the possibilities of crashes, memory leaks and memory corruption. The other thing to keep in mind is that the UDF runs within the Impala process so essentially has the same permissions as the "impala" user. A malicious UDF could exploit this. This is why we recommend reviewing UDF code. It sounds like you're already following that best practice.
... View more
12-04-2017
05:02 PM
Yes, Close() will be called if the query fails. Keep in mind that the same UDF can be in use from multiple threads at the same time, so any cleanup logic needs to be thread-safe and not clean up things that other threads might be using.
... View more
11-28-2017
12:22 PM
1 Kudo
Hi Julien, I wanted to clarify the question a bit to understand what you're trying to achieve. Impala really has two different concepts. "Fragments" are a way of breaking down the query plan into units that can be executed in a distributed manner. You can see these in query plans with explain_level >= 2. They show up as sections of the plan with a heading like "F00: PLAN FRAGMENT". There are only two modes here. The default is to produce a distributed plan, which is broken up into fragments. The alternative, when the option num_nodes is set to 1, is to produce a single-node plan with only a single fragment. The other concept is "fragment instances", which is the number of instances of each plan fragment that are run by the query. By default you generally get 0 or 1 fragments per impala daemon, depending on whether there is any data to scan, but we will do the scanning of data in a multi-threaded way. We have a new mode, under development, where you get multiple fragments per Impala daemon, controlled by the mt_dop query option. This only works for some queries, without inserts or joins and can sometimes consume a lot more resources. mt_dop can increase throughput of queries if the bottleneck is outside of the scan, e.g. in an aggregation.
... View more
11-07-2017
10:44 AM
If you're seeing a crash, any diagnostics are useful. E.g. log files with crash messages from /var/log/impalad, hs_err_pid*.log files, etc. You can also file a bug report on the Apache JIRA: https://issues.apache.org/jira/browse/IMPALA - it's generally easier to attach things and track things. The crash could be something like https://issues.apache.org/jira/browse/IMPALA-2648 but I'm just speculating. Generally large numbers of partitions or files can blow up the size of metadata and cause degraded performance.
... View more
10-24-2017
11:38 AM
1 Kudo
Hi Majuyell, I think the problem is your use of unix_timestamp(), which returns a value in seconds precision. You probably want to convert directly to timestamp. It looks like your timestamps are in a regular format so you can probably cast directly. > select cast("2005-05-04 11:12:54.297" as timestamp);
+----------------------------------------------+
| cast('2005-05-04 11:12:54.297' as timestamp) |
+----------------------------------------------+
| 2005-05-04 11:12:54.297000000 |
+----------------------------------------------+ Otherwise you can use to_timestamp() with a timestamp format string: > select to_timestamp("2005-05-04 11:12:54.297", "yyyy-MM-dd HH:mm:ss.SSS");
+--------------------------------------------------------------------+
| to_timestamp('2005-05-04 11:12:54.297', 'yyyy-mm-dd hh:mm:ss.sss') |
+--------------------------------------------------------------------+
| 2005-05-04 11:12:54.297000000 |
+--------------------------------------------------------------------+ Then you can use from_timestamp to format it in your desired way: > select from_timestamp(to_timestamp("2005-05-04 11:12:54.297", "yyyy-MM-dd HH:mm:ss.SSS"), "yyyy-MM-dd HH:mm:ss.SSS");
+---------------------------------------------------------------------------------------------------------------+
| from_timestamp(to_timestamp('2005-05-04 11:12:54.297', 'yyyy-mm-dd hh:mm:ss.sss'), 'yyyy-mm-dd hh:mm:ss.sss') |
+---------------------------------------------------------------------------------------------------------------+
| 2005-05-04 11:12:54.297 |
+---------------------------------------------------------------------------------------------------------------+
... View more
10-20-2017
05:34 PM
"Exprs" tracks memory used by any SQL expressions evaluated in that node. E.g. if you have a regexp_replace() function operating on large strings, it will use more memory than a simple expression like "1 + 2". The main use case is to help track down UDFs that uses a lot of memory, e.g. if they have a memory leak.
... View more
10-12-2017
01:44 PM
Hi Creaping, It looks like you already discovered the workaround I would suggest. We don't support any way to do this automatically for now. I agree that it would be convenient for some use cases.
... View more
09-29-2017
05:29 PM
Yeah I agree - I'd like to spend some time cleaning that up 🙂
... View more
09-29-2017
05:21 PM
I just saw this in my email backlog. Yes this is the expected behaviour. Your UDF may be called again on the same thread or a different thread. After you call SetError() the query will fail but the error will take some time to propagate.
... View more