Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7590 | 12-18-2020 01:46 PM | |
4972 | 12-16-2020 12:11 PM | |
3785 | 12-07-2020 01:47 PM | |
2471 | 12-07-2020 09:21 AM | |
1613 | 10-14-2020 11:15 AM |
01-23-2017
03:07 PM
2 Kudos
Hi Akhil, The only way I can think of to achieve this is to refer the udf by its fully-qualified name. E.g. if you create a function "my_fn" in a database "my_db" you can call it as my_db.my_fn() from any database.
... View more
01-19-2017
01:32 PM
It looks like the query was only able to get 223MB of memory - perhaps there are other queries running at the same time?
... View more
01-12-2017
03:40 PM
What kind of performance difference are we talking about? 5%? 100%? It's helpful to look at execution summaries or profiles to drill down on where the difference is (if you're using impala-shell, you can get them with the summary; and profile; commands after running a query). If the whole data set you're querying fits in memory, HDFS caching may not be that beneficial, since the OS buffer cache can be pretty effective at keeping the data in memory, especially if you're re-running the same query on the same data back-to-back. Also if the query is somewhat complex, it can get CPU-bound pretty quickly.
... View more
01-12-2017
03:07 PM
Hi efumas, What version of Impala are you running? For more recent versions of Impala the query error log will include a more detailed dump of which query operators are using memory. It will also likely show up in the impalad* logs. Generally this error means that you don't have enough memory to execute the query. The memory limits that can apply are the total process memory limit (set for an entire Impala daemon when it is started) or the query memory limit (set via the mem_limit query option). - Tim
... View more
01-11-2017
11:29 AM
Unfortunately there are some known issues with rand(). This is essentially the same issue as https://issues.cloudera.org/browse/IMPALA-397 (Order by rand() does not work). Impala's planner doesn't currently fully understand the concept of a non-deterministic or random function, so it will often produce plans that either evaluate rand() repeatedly when logically it shouldn't or caches the value of rand(). In this particular case, it evaluates essentially substitutes random for rand() and re-evaluates it multiple times. [localhost:21000] > explain select
case when random < 0.005 then 1
when random < 0.0175 and random >= 0.005 then 2
when random < 0.0175 and random >= 0.0175 then 3
when random < 0.2500 and random >= 0.0800 then 4
else 0 end segment,
min(random),max(random),
count(id)
from (
select l_orderkey id,RAND(unix_timestamp()) random from tpch_parquet.lineitem limit 1000000) j
group by segment;
Query: explain select
case when random < 0.005 then 1
when random < 0.0175 and random >= 0.005 then 2
when random < 0.0175 and random >= 0.0175 then 3
when random < 0.2500 and random >= 0.0800 then 4
else 0 end segment,
min(random),max(random),
count(id)
from (
select l_orderkey id,RAND(unix_timestamp()) random from tpch_parquet.lineitem limit 1000000) j
group by segment
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=80.00MB VCores=1 |
| |
| PLAN-ROOT SINK |
| | |
| 01:AGGREGATE [FINALIZE] |
| | output: min(rand(1484133976)), max(rand(1484133976)), count(l_orderkey) |
| | group by: CASE WHEN rand(1484133976) < 0.005 THEN 1 WHEN rand(1484133976) < 0.0175 AND rand(1484133976) >= 0.005 THEN 2 WHEN rand(1484133976) < 0.0175 AND rand(1484133976) >= 0.0175 THEN 3 WHEN rand(1484133976) < 0.2500 AND rand(1484133976) >= 0.0800 THEN 4 ELSE 0 END |
| | |
| 02:EXCHANGE [UNPARTITIONED] |
| | limit: 1000000 |
| | |
| 00:SCAN HDFS [tpch_parquet.lineitem] |
| partitions=1/1 files=3 size=193.61MB |
| limit: 1000000 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Generally rand() will work as expected if it's in the select list of the outer query. E.g. "create table tmp_rand as select rand(unix_timestamp()) from table" would do what you expect. So you could maybe work around it by creating a temporary table instead of using a subquery (I know that's not ideal).
... View more
01-10-2017
10:03 AM
1 Kudo
Hi Petter, This was on our radar - we usually triage anything with a "correctness" label (which you added) periodically - it's obviously a serious issue. I updated the JIRA. - Tim
... View more
12-23-2016
06:42 AM
Hi RPAT, The values of .ptr and .len are invalid if .is_null is true. For a null string value, in some cases Impala just sets the is_null field in this case and doesn't overwrite the ptr and len fields. You should rewrite the condition as: if (sInput.is_null) { ...
} else {
...
} This isn't explicitly documented so we should improve that: https://issues.cloudera.org/browse/IMPALA-4711
... View more
12-08-2016
08:34 AM
1 Kudo
Good point - we should handle this more gracefully. I filed https://issues.cloudera.org/browse/IMPALA-4629 to track the issue.
... View more
12-05-2016
03:02 PM
Yeah Cloudera Manager's agent will restart it automatically (at least in the default config I believe).
... View more
12-05-2016
11:05 AM
lt looks like maybe your catalog service is having problems. It would be worth looking in the catalogd logs for clues.
... View more