Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 7714 | 12-18-2020 01:46 PM | |
| 5041 | 12-16-2020 12:11 PM | |
| 3845 | 12-07-2020 01:47 PM | |
| 2503 | 12-07-2020 09:21 AM | |
| 1633 | 10-14-2020 11:15 AM |
01-09-2019
12:57 PM
They're used for spill to disk - see https://www.cloudera.com/documentation/enterprise/latest/topics/impala_scalability.html#spill_to_disk
... View more
01-02-2019
10:57 AM
UDFs can do a lot of things because they run with the same privileges as the Impala process. However, doing things other than the usual computations in the UDF, like accessing filesystems or external services, can compromise the performance and stability of your system. So you do this at your own risk. In the future we may lock down UDFs more and prevent them from doing things like accessing HDFS.
... View more
12-31-2018
01:54 PM
On CDH5.15 in most cases they won't hold onto resources in admission control, unless the query isn't cancelled and the client (i.e. Hue) doesn't fetch all of the results. Enabling the timeouts suggested by Eric helps ensure that queries get cancelled in timely manner
... View more
12-28-2018
07:54 AM
It's unlikely that the query is executing that long. Most likely the client you are using is delayed in closing the query.
... View more
12-17-2018
01:04 PM
I don't know too much about that unfortunately.
... View more
12-17-2018
11:35 AM
Hi @Big I checked on our latest build and it works for me - see below. Are you sure that you're not trying to query a table with a DATE type column? [localhost:21000] default> create table foo2 (`date` int);
Query: create table foo2 (`date` int)
+-------------------------+
| summary |
+-------------------------+
| Table has been created. |
+-------------------------+
Fetched 1 row(s) in 1.19s
[localhost:21000] default> select distinct `date` from foo2;
Query: select distinct `date` from foo2
Fetched 0 row(s) in 0.12s
... View more
12-08-2018
10:35 AM
1 Kudo
I took a quick look at the Impyla code and rowcount() always returns -1 and the other two methods you mention are not implemented: https://github.com/cloudera/impyla/ At the moment Impyla isn't officially part of CDH - it was developed by one of our data scientists and open sourced for the benefit of the community - all of the documentation and so on is just in that github repo.
... View more
12-08-2018
10:25 AM
Actually, scratch what I just said - that advice applies if the query is stuck in the FINISHED state. If it's stuck in the RUNNING state, it means the query is just taking a long time to produce any results. So you're probably getting a bad query plan on one cluster that is extremely slow to execute. E.g. the order of the joins chosen by the planner is inefficient. Usually computing stats on all the tables will improve the query plan.
... View more
12-08-2018
10:23 AM
There's no relation between resource reservations and query states. There's probably two things going on: You're getting different query plans on the two clusters - either the data or table definitions is different or the stats are missing or out-of-date on one cluster The query is being kept running because the output rows are not all fetched by hue? This can happen if the query returns more than a page of rows and the user does not scroll through the whole result set - the issue is that Hue only fetches results on demand and Impala keeps the query running until the last row is fetched by Hue. How many rows are being returned from the query. We're looking at making this more robust - the scenario is avoidable. As a mitigation we usually suggest setting an "idle query timeout" in Cloudera manager to automatically cancel queries that have been hanging around for a while with no client activity. Edit: second observation was wrong. See my next post.
... View more
11-28-2018
10:31 AM
@alexmc6 I think there's (understandably) some misunderstanding of what the different mechanisms there do. Memory estimates only play a role if you set "Max Memory" and leave "Default Query Memory Limit" unset or set to 0. I always recommend against that mode for exactly the reason you mentioned.
... View more