Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6132 | 12-18-2020 01:46 PM | |
4011 | 12-16-2020 12:11 PM | |
2850 | 12-07-2020 01:47 PM | |
2006 | 12-07-2020 09:21 AM | |
1291 | 10-14-2020 11:15 AM |
10-03-2020
09:40 PM
The docs have a better and more complete explanation of Impala admission control than I could give in a reply here - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_admission.html. There's also an example in the same section - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_rm_example.html Min/max memory limits are only available in CDH6.1 and up. if you don't want to or aren't able to fully implement Impala admission control, a partway solution to mitigate against a query using all the memory is to leave max memory unset (so that memory-based admission control is not enabled) and set the default query memory limit on the pool. That just limits the amount of memory any one query can use up.
... View more
10-02-2020
05:45 PM
This query is using up most of the memory on the impala daemon and there is not enough headroom to start your other query. Query(78befceb1eef47:d33db5f200030000): Reservation=47.49 GB ReservationLimit=48.00 GB OtherMemory=293.93 MB Total=47.78 GB Peak=47.81 GB You can restrict memory usage of a query by setting the mem_limit option for that query. If you want to do that globally for all queries in cluster, impala admission control can do that - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_admission.html E.g. you could set up memory-based admission control with a min memory limit of 2GB and a max memory limit of 20GB to prevent any one query from taking up all the memory on a node.
... View more
09-22-2020
09:08 AM
1 Kudo
Sentry testing mode would be your only option that I can think of. The problem with using Sentry without Kerberos or LDAP authentication is that it doesn't provide any real security since the client isn't authenticated. So we don't recommend in production because it provides the illusion of security but no security.
... View more
09-21-2020
09:57 AM
1 Kudo
This is definitely a bug. Thanks for the clear report and reproduction. It's not IMPALA-7957 but is somewhat related. This is new to us so I filed https://issues.apache.org/jira/browse/IMPALA-10182 to track it. It looks like it can only happen when you have a UNION ALL, plus subqueries where the same column appears twice in the select list, plus NULL values in those columns. You can work around the issue by removing the duplicated entries in the subquery select list. E.g. the following query is equivalent and returns the expected results. SELECT
MIN(t_53.c_41) c_41,
CAST(NULL AS DOUBLE) c_43,
CAST(NULL AS BIGINT) c_44,
t_53.c2 c2,
t_53.c2 c3s0,
t_53.c4 c4,
t_53.c4 c5s0
FROM
( SELECT
t.productsubcategorykey c_41,
t.productline c2,
t.productsubcategorykey c4
FROM
as_adventure.t1 t
WHERE
true
GROUP BY
2,
3 ) t_53
GROUP BY
4,
5,
6,
7
UNION ALL
SELECT
MIN(t_53.c_41) c_41,
CAST(NULL AS DOUBLE) c_43,
CAST(NULL AS BIGINT) c_44,
t_53.c2 c2,
t_53.c2 c3s0,
t_53.c5s0 c4,
t_53.c5s0 c5s0
FROM
( SELECT
t.productsubcategorykey c_41,
t.productline c2,
t.productsubcategorykey c5s0
FROM
as_adventure.t1 t
WHERE
true
GROUP BY
2,
3) t_53
GROUP BY
4,
5,
6,
7;
... View more
08-23-2020
02:19 PM
You need to cast one of the branches of the else to be a compatible type with the other one. The problem is that both decimal types have the max precision (38) and different scale and neither can be converted automatically to the other without potentially losing precision. A lot of the decimal behaviour such as result types of expressions was changed in CDH6 (and upstream Apache Impala 3.0). https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_decimal.html has a lot of related information.
... View more
08-19-2020
10:22 AM
I'm not aware of any significant regressions in planning time between those versions. There were actually some major improvements for some common types of complex queries with many columns - https://issues.apache.org/jira/browse/IMPALA-4242 So there's no known issue that this obviously maps to (the problem described is quite abstract so take that with a grain of salt). There were a couple of issues related to authorization and Sentry that I initially thought about but I believe had been addressed by 6.3.1 (keep in mind that there are quite a lot of improvements in CDH6.3.1 relative to Impala 3.2.0). Anyway I don't want to speculate too much without even knowing which part of planning may be slow. Can you provide query profiles for those queries? Or if that isn't possible, at least the "Query Timeline" and "Planner Timeline" for the fast and slow queries. Edit: just to be clear, the info you provided about the views was useful, but this seems like it's probably something pretty specific to your queries so it's likely any investigation is going to be most fruitful starting from data about the specific queries in your environment.
... View more
08-15-2020
01:03 PM
I think the reality is now that both are great technologies and the overlap in use cases is pretty big - there are a lot of SQL workloads where either can work. I just wanted to clarify a few points. Impala does support querying complex types from Parquet - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html We also are working on a transparent query retry feature in Impala that should be released soon.
... View more
07-29-2020
10:19 AM
Yes we should be able to prune based on range partitions. https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_kudu.html#kudu_partitioning has some examples of how to set up a table with both range and hash partitions. You can specify arbitrary timestamp ranges for the partitions. You can see in the Impala explain plan if your WHERE predicates were converted into kudu pushdown predicates (they're labelled kudu predicates).
... View more
07-28-2020
10:48 AM
Ahh 5.11, there's been so many Impala improvements since then! This happens when the Impala daemon can't load the initial catalog (i.e. database and table metadata). The catalog and statestore roles are both involved in the catalog loading, so if the impala daemon isn't able to communicate with those roles, or those are not started or healthy then that could lead to these symptoms. You should be able to see in Cloudera Manager if they're started and if there are any warnings or errors being flagged. It might also be just that the catalog is slow to load (maybe there's a lot of metadata or something else is unhealthy). You would need to look at the logs of the impala daemon you're connecting and maybe the catalog to see what it's doing and why its slow. I know this doesn't address your immediate problem, but we've seen a lot of these metadata/catalog problem go away with later versions - CDH5.16 or CDH6+, and particularly by moving to a dedicated coordinator/executor topology - https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/impala_dedicated_coordinator.html.
... View more
07-27-2020
08:48 AM
@Mara the previous solution is a bit out of date. We fixed this in CDH 5.14 and up so that clients can't connect until the service is ready. So that would avoid the issue. The issue happened in older versions during the impala daemon startup. It can happen for a longer period when some of the services for the impala cluster (catalog or statestore) are not operational, because the impala daemon can't finish startup in those cases.
... View more