About Tim Armstrong

Tim Armstrong · ‎10-03-2020

The docs have a better and more complete explanation of Impala admission control than I could give in a reply here - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_admission.html. There's also an example in the same section - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_rm_example.html Min/max memory limits are only available in CDH6.1 and up. if you don't want to or aren't able to fully implement Impala admission control, a partway solution to mitigate against a query using all the memory is to leave max memory unset (so that memory-based admission control is not enabled) and set the default query memory limit on the pool. That just limits the amount of memory any one query can use up.

Tim Armstrong · ‎10-02-2020

This query is using up most of the memory on the impala daemon and there is not enough headroom to start your other query. Query(78befceb1eef47:d33db5f200030000): Reservation=47.49 GB ReservationLimit=48.00 GB OtherMemory=293.93 MB Total=47.78 GB Peak=47.81 GB You can restrict memory usage of a query by setting the mem_limit option for that query. If you want to do that globally for all queries in cluster, impala admission control can do that - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_admission.html E.g. you could set up memory-based admission control with a min memory limit of 2GB and a max memory limit of 20GB to prevent any one query from taking up all the memory on a node.

Tim Armstrong · ‎09-22-2020

Sentry testing mode would be your only option that I can think of. The problem with using Sentry without Kerberos or LDAP authentication is that it doesn't provide any real security since the client isn't authenticated. So we don't recommend in production because it provides the illusion of security but no security.

Tim Armstrong · ‎09-21-2020

This is definitely a bug. Thanks for the clear report and reproduction. It's not IMPALA-7957 but is somewhat related. This is new to us so I filed https://issues.apache.org/jira/browse/IMPALA-10182 to track it. It looks like it can only happen when you have a UNION ALL, plus subqueries where the same column appears twice in the select list, plus NULL values in those columns. You can work around the issue by removing the duplicated entries in the subquery select list. E.g. the following query is equivalent and returns the expected results. SELECT MIN(t_53.c_41) c_41, CAST(NULL AS DOUBLE) c_43, CAST(NULL AS BIGINT) c_44, t_53.c2 c2, t_53.c2 c3s0, t_53.c4 c4, t_53.c4 c5s0 FROM ( SELECT t.productsubcategorykey c_41, t.productline c2, t.productsubcategorykey c4 FROM as_adventure.t1 t WHERE true GROUP BY 2, 3 ) t_53 GROUP BY 4, 5, 6, 7 UNION ALL SELECT MIN(t_53.c_41) c_41, CAST(NULL AS DOUBLE) c_43, CAST(NULL AS BIGINT) c_44, t_53.c2 c2, t_53.c2 c3s0, t_53.c5s0 c4, t_53.c5s0 c5s0 FROM ( SELECT t.productsubcategorykey c_41, t.productline c2, t.productsubcategorykey c5s0 FROM as_adventure.t1 t WHERE true GROUP BY 2, 3) t_53 GROUP BY 4, 5, 6, 7;

Tim Armstrong · ‎08-23-2020

You need to cast one of the branches of the else to be a compatible type with the other one. The problem is that both decimal types have the max precision (38) and different scale and neither can be converted automatically to the other without potentially losing precision. A lot of the decimal behaviour such as result types of expressions was changed in CDH6 (and upstream Apache Impala 3.0). https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_decimal.html has a lot of related information.

Tim Armstrong · ‎08-19-2020

I'm not aware of any significant regressions in planning time between those versions. There were actually some major improvements for some common types of complex queries with many columns - https://issues.apache.org/jira/browse/IMPALA-4242 So there's no known issue that this obviously maps to (the problem described is quite abstract so take that with a grain of salt). There were a couple of issues related to authorization and Sentry that I initially thought about but I believe had been addressed by 6.3.1 (keep in mind that there are quite a lot of improvements in CDH6.3.1 relative to Impala 3.2.0). Anyway I don't want to speculate too much without even knowing which part of planning may be slow. Can you provide query profiles for those queries? Or if that isn't possible, at least the "Query Timeline" and "Planner Timeline" for the fast and slow queries. Edit: just to be clear, the info you provided about the views was useful, but this seems like it's probably something pretty specific to your queries so it's likely any investigation is going to be most fruitful starting from data about the specific queries in your environment.

Tim Armstrong · ‎08-15-2020

I think the reality is now that both are great technologies and the overlap in use cases is pretty big - there are a lot of SQL workloads where either can work. I just wanted to clarify a few points. Impala does support querying complex types from Parquet - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html We also are working on a transparent query retry feature in Impala that should be released soon.

Tim Armstrong · ‎07-29-2020

Yes we should be able to prune based on range partitions. https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_kudu.html#kudu_partitioning has some examples of how to set up a table with both range and hash partitions. You can specify arbitrary timestamp ranges for the partitions. You can see in the Impala explain plan if your WHERE predicates were converted into kudu pushdown predicates (they're labelled kudu predicates).

Tim Armstrong · ‎07-28-2020

Ahh 5.11, there's been so many Impala improvements since then! This happens when the Impala daemon can't load the initial catalog (i.e. database and table metadata). The catalog and statestore roles are both involved in the catalog loading, so if the impala daemon isn't able to communicate with those roles, or those are not started or healthy then that could lead to these symptoms. You should be able to see in Cloudera Manager if they're started and if there are any warnings or errors being flagged. It might also be just that the catalog is slow to load (maybe there's a lot of metadata or something else is unhealthy). You would need to look at the logs of the impala daemon you're connecting and maybe the catalog to see what it's doing and why its slow. I know this doesn't address your immediate problem, but we've seen a lot of these metadata/catalog problem go away with later versions - CDH5.16 or CDH6+, and particularly by moving to a dedicated coordinator/executor topology - https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/impala_dedicated_coordinator.html.

Tim Armstrong · ‎07-27-2020

@Mara the previous solution is a bit out of date. We fixed this in CDH 5.14 and up so that clients can't connect until the service is ready. So that would avoid the issue. The issue happened in older versions during the impala daemon startup. It can happen for a longer period when some of the services for the impala cluster (catalog or statestore) are not operational, because the impala daemon can't finish startup in those cases.

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	140

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Impala query failed

Re: Impala query failed

Re: Create Select Only user in HUE / Impala withou...

Re: "union all" dropping records with all null/emp...

Re: an error is reported when impala executes a ca...

Re: Impala 3.2.0 performance degradation while que...

Re: Wich sql engine best solution to use with CDP ...

Re: Kudu Partition on Timestamp column

Re: This Impala daemon is not ready to accept user...

Re: This Impala daemon is not ready to accept user...