About Tim Armstrong

Tim Armstrong · ‎09-20-2019

@Zane- I'm late but can provide some additional insight. I think the suggestion in the error message is a good one (I'm biased because I wrote it, but some thought went into it). "Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error". The general solution for this is to set up admission control with some memory limits so that memory doesn't get oversubscribed, and so that one query can't gobble up more memory than you like. I did a talk at strata that gave pointers on a lot of this things - https://conferences.oreilly.com/strata/strata-ca-2019/public/schedule/detail/73000 In this case you can actually see that query 2f4b5cff11212907:886aa1400000000 is using Total=78.60 GB memory, so that's likely your problem. Impala's resource management is totally permissive out of the box and will happily let queries use up all the resources in the system like this. I didn't see what version you're running, but there were a lot of improvements in this area (config options, OOM-avoidance, diagnostics) in CDH6.1+ There's various other angles you can take to improve this - if the queries using lots of memory are suboptimal, tuning them (maybe just computing stats) makes a big difference. You can also

Tim Armstrong · ‎08-16-2019

> It's hard to believe a count of 1000 records is taking 2.2 hours. So, I closed the session and did not see the mentioned "Released admission control resources" value. Yeah I agree, something is weird here. We've seen symptoms like this if there were dropped connections in the network layer causing hangs or similar > BTW, it's not holding up queries from getting admitted as far as I can tell. We ran into a problem where it did not have enough memory to allocate to a query and returned an error. That's what got me started down this road in the first place. Thanks for clarifying, that makes sense. Your cluster does sound unhappy, it does sound an awful lot like some fragments of the query have gotten stuck. We've seen this happen because of issues communicating with HDFS before (e.g. because of a heavily loaded namenode), and we've also seen hangs in the JVM - https://issues.apache.org/jira/browse/IMPALA-7482. If it's a JVM issue, we've seen in some cases that increasing heap sizes helps. Setting ipc.client.rpc-timeout.ms to 60000 (i.e. 60 seconds) under CM > Impala > Configuration > Impala Daemon HDFS Advanced Configuration Snippet (Safety Valve) might help if it's a namenode issue. We've also seen the file handle cache that got enabled by default in CDH5.15 help a lot in reducing namenode load, we know that some customers upgraded and saw some pretty dramatic improvements from that (and also various other improvements in that release). We've done a lot of stuff in this space in general over the last year or two so I wouldn't be surprised if an upgrade fixed things, even without knowing exactly what you're running into. > as the very word fetch means it got a result and pulled it back for viewing. Why would the word "fetch" be used in place of the word "requested?" I agree 100%. I think whoever named it was either overly optimistic and assumed there wouldn't be a significant gap in time, or it was named from the point of view of the code rather than the external system.

Tim Armstrong · ‎08-16-2019

After the IMPALA-1575 fixes (https://issues.apache.org/jira/browse/IMPALA-1575), which are in CDH5.14, resources will be released once the last row is fetched or the query is cancelled. It looks like that isn't happening for some reason here - either the query is just taking a while to compute the count or the client is slow to fetch the results (can't tell from the profile fragment). "Released admission control resources" will show up in the query timeline when the resources are released. After that point it shouldn't hold up other queries getting admitted. Side-note - there's a monitoring issue here where the query does show as executing until the client closes the query, even though it isn't holding onto significant resources. Hue keeps the queries open in case the user reloads the page and it wants to re-fetch the results. We fixed this in CDH6.2 with https://issues.apache.org/jira/browse/IMPALA-5397. That profile is confusing me a bit. count(*) only returns one row so I would think that it would return quickly after the first row was fetched (one quirk of the "first row fetched" event is that it tracks when the row was requested, not when it was returned). The best theory I have based on the profile fragment is that the count(*) hasn't actually been returned yet and Hue is blocked waiting to fetch that row, either cause it's being computed or because something is hung. The full profile might help here. But it seems something slightly odd is happening.

Tim Armstrong · ‎08-16-2019

This can also happen if the query is returning a lot of rows, or if the client is very slow at fetching rows.

Tim Armstrong · ‎08-16-2019

@pollardthe documentation is accurate, many people use those flags successfully. I wouldn't want to speculate about what's happening in your case. If you include a query profile that can help to diagnose. We've seen things like this happen when there's a client polling the query for status and keeping it alive (the timeout is since the last time the client performed an operation on the query or session).

Tim Armstrong · ‎07-29-2019

I filed https://issues.apache.org/jira/browse/IMPALA-8807 to fix the docs.

Tim Armstrong · ‎07-29-2019

That example does show that it works in at least one case with a where referencing a partition column. I don't know off the top of my head the exact set of cases where it works, but it does seem like the docs are not totally accurate.

Tim Armstrong · ‎07-26-2019

Like @EricL said, this would be caused by some process updating files in the table in the background without a refresh in Impala. E.g. if you have a job that writes files directly into the table and can either write incomplete files or has Impala see the files before they are completely written (preferably you write the files in a temporary directory then move them into the table directory). Some usage patterns for hive might cause issues, e.g. INSERT OVERWRITE. There was a related issue in Impala that could occur if you did an "INSERT OVERWRITE" from hive without a refresh from Impala: https://issues.apache.org/jira/browse/IMPALA-8561. Generally that workflow (insert overwrite without refresh) is problematic, but the symptoms were made more confusing by IMPALA-8561.

Tim Armstrong · ‎07-24-2019

Yes! Glad you asked. There is an optimisation that can be enabled with the OPTIMIZE_PARTITION_KEY_SCANS query option: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_optimize_partition_key_scans.html. This converts queries like your example into a metadata-only query. The only reason it isn't enabled by default is because you can get different results if you have a partition with only files with 0 rows in it - the metadata doesn't have enough information to detect this case. Here it is in action: [tarmstrong-box2.ca.cloudera.com:21000] default> set OPTIMIZE_PARTITION_KEY_SCANS = 1; OPTIMIZE_PARTITION_KEY_SCANS set to 1 [tarmstrong-box2.ca.cloudera.com:21000] default> explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0; Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0 +--------------------------------------------------------+ | Explain String | +--------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=0B Threads=1 | | Per-Host Resource Estimates: Memory=10MB | | Codegen disabled by planner | | | | PLAN-ROOT SINK | | | | | 01:AGGREGATE [FINALIZE] | | | output: max(ss_sold_date_sk) | | | row-size=4B cardinality=1 | | | | | 00:UNION | | constant-operands=182 | | row-size=4B cardinality=182 | +--------------------------------------------------------+

Tim Armstrong · ‎07-15-2019

If I had to guess, the CDH installation is somehow broken and missing jar files. Impala depends on antlr so won't be able to run if that isn't present. The JARs should be part of the CDH parcel, e.g. in /opt/cloudera/parcels/CDH-<version>/lib/impala/lib

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: ExecQueryFInstances rpc query_id=e74ef8d9b9215...

Re: impala running jobs does not finish in time

Re: impala running jobs does not finish in time

Re: impala running jobs does not finish in time

Re: impala running jobs does not finish in time

Re: Avoiding hdfs scan when querying only partitio...

Re: Avoiding hdfs scan when querying only partitio...

Re: Impala-shell invalid version number

Re: Avoiding hdfs scan when querying only partitio...

Re: Impala daemon stuck at Ready Check