Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 7741 | 12-18-2020 01:46 PM | |
| 5050 | 12-16-2020 12:11 PM | |
| 3852 | 12-07-2020 01:47 PM | |
| 2504 | 12-07-2020 09:21 AM | |
| 1633 | 10-14-2020 11:15 AM |
09-20-2019
10:04 AM
@Zane- I'm late but can provide some additional insight. I think the suggestion in the error message is a good one (I'm biased because I wrote it, but some thought went into it). "Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error". The general solution for this is to set up admission control with some memory limits so that memory doesn't get oversubscribed, and so that one query can't gobble up more memory than you like. I did a talk at strata that gave pointers on a lot of this things - https://conferences.oreilly.com/strata/strata-ca-2019/public/schedule/detail/73000 In this case you can actually see that query 2f4b5cff11212907:886aa1400000000 is using Total=78.60 GB memory, so that's likely your problem. Impala's resource management is totally permissive out of the box and will happily let queries use up all the resources in the system like this. I didn't see what version you're running, but there were a lot of improvements in this area (config options, OOM-avoidance, diagnostics) in CDH6.1+ There's various other angles you can take to improve this - if the queries using lots of memory are suboptimal, tuning them (maybe just computing stats) makes a big difference. You can also
... View more
08-16-2019
05:40 PM
> It's hard to believe a count of 1000 records is taking 2.2 hours. So, I closed the session and did not see the mentioned "Released admission control resources" value. Yeah I agree, something is weird here. We've seen symptoms like this if there were dropped connections in the network layer causing hangs or similar > BTW, it's not holding up queries from getting admitted as far as I can tell. We ran into a problem where it did not have enough memory to allocate to a query and returned an error. That's what got me started down this road in the first place. Thanks for clarifying, that makes sense. Your cluster does sound unhappy, it does sound an awful lot like some fragments of the query have gotten stuck. We've seen this happen because of issues communicating with HDFS before (e.g. because of a heavily loaded namenode), and we've also seen hangs in the JVM - https://issues.apache.org/jira/browse/IMPALA-7482. If it's a JVM issue, we've seen in some cases that increasing heap sizes helps. Setting ipc.client.rpc-timeout.ms to 60000 (i.e. 60 seconds) under CM > Impala > Configuration > Impala Daemon HDFS Advanced Configuration Snippet (Safety Valve) might help if it's a namenode issue. We've also seen the file handle cache that got enabled by default in CDH5.15 help a lot in reducing namenode load, we know that some customers upgraded and saw some pretty dramatic improvements from that (and also various other improvements in that release). We've done a lot of stuff in this space in general over the last year or two so I wouldn't be surprised if an upgrade fixed things, even without knowing exactly what you're running into. > as the very word fetch means it got a result and pulled it back for viewing. Why would the word "fetch" be used in place of the word "requested?" I agree 100%. I think whoever named it was either overly optimistic and assumed there wouldn't be a significant gap in time, or it was named from the point of view of the code rather than the external system.
... View more
08-16-2019
01:35 PM
After the IMPALA-1575 fixes (https://issues.apache.org/jira/browse/IMPALA-1575), which are in CDH5.14, resources will be released once the last row is fetched or the query is cancelled. It looks like that isn't happening for some reason here - either the query is just taking a while to compute the count or the client is slow to fetch the results (can't tell from the profile fragment). "Released admission control resources" will show up in the query timeline when the resources are released. After that point it shouldn't hold up other queries getting admitted. Side-note - there's a monitoring issue here where the query does show as executing until the client closes the query, even though it isn't holding onto significant resources. Hue keeps the queries open in case the user reloads the page and it wants to re-fetch the results. We fixed this in CDH6.2 with https://issues.apache.org/jira/browse/IMPALA-5397. That profile is confusing me a bit. count(*) only returns one row so I would think that it would return quickly after the first row was fetched (one quirk of the "first row fetched" event is that it tracks when the row was requested, not when it was returned). The best theory I have based on the profile fragment is that the count(*) hasn't actually been returned yet and Hue is blocked waiting to fetch that row, either cause it's being computed or because something is hung. The full profile might help here. But it seems something slightly odd is happening.
... View more
08-16-2019
10:30 AM
This can also happen if the query is returning a lot of rows, or if the client is very slow at fetching rows.
... View more
08-16-2019
10:30 AM
@pollardthe documentation is accurate, many people use those flags successfully. I wouldn't want to speculate about what's happening in your case. If you include a query profile that can help to diagnose. We've seen things like this happen when there's a client polling the query for status and keeping it alive (the timeout is since the last time the client performed an operation on the query or session).
... View more
07-29-2019
05:12 PM
I filed https://issues.apache.org/jira/browse/IMPALA-8807 to fix the docs.
... View more
07-29-2019
12:18 AM
That example does show that it works in at least one case with a where referencing a partition column. I don't know off the top of my head the exact set of cases where it works, but it does seem like the docs are not totally accurate.
... View more
07-26-2019
10:26 AM
Like @EricL said, this would be caused by some process updating files in the table in the background without a refresh in Impala. E.g. if you have a job that writes files directly into the table and can either write incomplete files or has Impala see the files before they are completely written (preferably you write the files in a temporary directory then move them into the table directory). Some usage patterns for hive might cause issues, e.g. INSERT OVERWRITE. There was a related issue in Impala that could occur if you did an "INSERT OVERWRITE" from hive without a refresh from Impala: https://issues.apache.org/jira/browse/IMPALA-8561. Generally that workflow (insert overwrite without refresh) is problematic, but the symptoms were made more confusing by IMPALA-8561.
... View more
07-24-2019
04:28 PM
Yes! Glad you asked. There is an optimisation that can be enabled with the OPTIMIZE_PARTITION_KEY_SCANS query option: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_optimize_partition_key_scans.html. This converts queries like your example into a metadata-only query. The only reason it isn't enabled by default is because you can get different results if you have a partition with only files with 0 rows in it - the metadata doesn't have enough information to detect this case. Here it is in action: [tarmstrong-box2.ca.cloudera.com:21000] default> set OPTIMIZE_PARTITION_KEY_SCANS = 1;
OPTIMIZE_PARTITION_KEY_SCANS set to 1
[tarmstrong-box2.ca.cloudera.com:21000] default> explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0;
Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0
+--------------------------------------------------------+
| Explain String |
+--------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B Threads=1 |
| Per-Host Resource Estimates: Memory=10MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 01:AGGREGATE [FINALIZE] |
| | output: max(ss_sold_date_sk) |
| | row-size=4B cardinality=1 |
| | |
| 00:UNION |
| constant-operands=182 |
| row-size=4B cardinality=182 |
+--------------------------------------------------------+
... View more
07-15-2019
08:43 AM
If I had to guess, the CDH installation is somehow broken and missing jar files. Impala depends on antlr so won't be able to run if that isn't present. The JARs should be part of the CDH parcel, e.g. in /opt/cloudera/parcels/CDH-<version>/lib/impala/lib
... View more