Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4537 | 12-18-2020 01:46 PM | |
2828 | 12-16-2020 12:11 PM | |
1905 | 12-07-2020 01:47 PM | |
1483 | 12-07-2020 09:21 AM | |
972 | 10-14-2020 11:15 AM |
06-05-2018
04:25 PM
@mauriciothanks for the profile. I think you might be better off tweaking DISABLE_CODEGEN_ROWS_THRESHOLD instead of using the big hammer of DISABLE_CODEGEN. The way that option works is that codegen is disabled automatically if the planner detects that no point in the query plan processes that number of rows per backend. The default is 50,000. E.g. if your query scans 100,000 rows split across three backends (33,333 per backend), it will disable codegen automatically. Instead of setting DISABLE_CODEGEN, I'd suggest increasing the value first. Based on the profile you sent me, it looks like something like 400000 might be sufficient for that query at least.
... View more
06-05-2018
02:17 PM
@mauricioI agree it's not great to turn it on globally. I'd be interested in seeing the query profile to understand what happened. We've made some codegen time improvements but there are still remaining issues so would be good to see if it's something we've fixed or not.
... View more
05-23-2018
03:09 PM
1 Kudo
@Hrishi1did you consider setting a default SCRATCH_LIMIT at the resource pool level so that queries will fail if they spill too much data? I know a lot of cluster admins do things like that to prevent runaway queries, and also so that users will come to them if they're trying to run big queries instead of them having to contact users. I understand that it's not exactly what you're looking for but I've seen people have success with it.
... View more
05-21-2018
05:09 PM
I looked into it and we don't currently support per-query alerts. I passed along this feedback to the Cloudera Manager team. I guess we already covered it, but my two suggestions would be: Set a default scratch_limit per-pool or globally so that users don't accidentally write queries that spill a lot of data Set up monitoring for some aggregate threshold, then use the queries page to discover the spilling queries. My philosophy on this is that spilling queries are nothing to be concerned about as long as queries are completing fast enough for your needs.
... View more
05-16-2018
11:47 AM
1 Kudo
I'm planning to get back to you with an answer - just haven't been able to find the time yet 🙂
... View more
05-14-2018
08:48 AM
Depending on exactly what you want to trigger on, you can use the generic function in CM to trigger based on any tsquery expression: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_triggers_usecases.html . There are a number of metrics tracking spill-to-disk: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_metrics_impala.html I don't fully understand the goal though - generally spill-to-disk happens transparently as part of normal query processing when memory is constrained and isn't cause for concern. If your aim is to prevent runaway spilling, the scratch_limit query option is a direct way to do that: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_scratch_limit.html . You can set the default query option globally or set default query options per-resource-pool via the "Dynamic Resource Pools" UI in CM. https://www.cloudera.com/documentation/enterprise/latest/topics/impala_disable_unsafe_spills.html is also occasionally useful.
... View more
05-11-2018
09:07 AM
2 Kudos
The CM queries tab keeps track of "Memory Spilled" per query. You can choose to display it via "select attributes" and also search for queries based on memory_spilled in the search box. If you click the down array next to the query and look at "query details", the information is in there too. The "Utilization Report" UI also has some aggregate information about memory spilled per resource pool.
... View more
05-02-2018
10:48 AM
There are also --idle_query_timeout and --idle_session_timeout startup flags that set an upper bound on the expiration. They might also be set.
... View more
04-27-2018
09:39 AM
https://issues.apache.org/jira/browse/IMPALA-6882 is easy to rule out since it only occurs on > 5 year old processors.
... View more
04-26-2018
11:11 AM
I agree we'd need more info to diagnose. Based on the correlation with querying a nested types table, it could be https://issues.apache.org/jira/browse/IMPALA-6489 which is fixed in the 5.14.2 maintenance release.
... View more