Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7586 | 12-18-2020 01:46 PM | |
4971 | 12-16-2020 12:11 PM | |
3785 | 12-07-2020 01:47 PM | |
2471 | 12-07-2020 09:21 AM | |
1613 | 10-14-2020 11:15 AM |
11-23-2016
03:32 PM
1 Kudo
We had an issue filed for this a while back: https://issues.cloudera.org/browse/IMPALA-3293 . It seems fairly reasonable but I think will depend on how much demand there is for it (or if someone contributes a patch for it).
... View more
11-23-2016
09:18 AM
1 Kudo
You're absolutely right - we use 10% as the default estimate for selectivity for scan predicates when we don't have a better estimate. One case where we have a better estimate is when the predicate is something like id = 100. In that case we can estimate that the selectivity is 1 / (num distinct values). There's also some logic to handle combining the estimates when there are multiple conditions. If you're curious, the code is here: https://github.com/apache/incubator-impala/blob/4db330e69a2dbb4a23f46e34b484da0d6b9ef29b/fe/src/main/java/org/apache/impala/planner/PlanNode.java#L518
... View more
11-18-2016
06:03 PM
We added support for --ldap_password_cmd in Impala 2.5, which I think addresses this problem. See https://issues.cloudera.org/browse/IMPALA-1934 https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_shell_options.html
... View more
11-18-2016
06:01 PM
This would typically happen if the catalog daemon was restarted.
... View more
11-16-2016
03:13 PM
If you're using impala-shell, you can use the "summary;" command. Otherwise it's accessible through the Impala debug web pages (typically http://the-impala-server:25000)
... View more
11-08-2016
05:24 PM
Please do open a JIRA - it's always good to have some context on the problem from users. It looks like the scanners in that profile are just idle (based on the user and system time) - so my guess is that the slowdown is something further up on the plan.
... View more
11-07-2016
02:34 PM
We could definitely improve some of the diagnostics there. My guess is that one node is either overloaded or has some kind of hardware issue - might be worth looking at the health and CPU/memory usage of different nodes to see if one stands out.
... View more
11-04-2016
05:25 PM
1 Kudo
One thing to keep in mind when interpreting the profiles is that a series of joins will typically be pipelined to avoid materialising results. This means that the whole pipeline runs at the speed of the slowest part of the pipeline. So the limiting factor could be the client (if you're returning a lot of results), the scan at the bottom of the plan, or any of the joins in the pipeline. TotalNetworkSendTime may be somewhat misleading since if the sender is running faster than the receiver, a backpressure mechanism kicks in that blocks the sender until the receiver has caught up. What's I'd recommend initially is comparing query summaries of the fast and slow queries to see where the difference in time is. If you're running in impala-shell you can get the summary of the last query by typing "summary;"
... View more
10-13-2016
03:20 PM
Good to hear! Please feel free to mark it as solved to make it easier for others to find.
... View more
10-03-2016
02:25 PM
Some examples of the calculations and numbers would be helpful. We use a C++ double as the underlying type, so have the same precision. There are a lot of subtleties with floating point numbers where calculations that are mathematically equivalent with real numbers can give different results with floating point numbers. E.g floating point arithmetic is not associative, so it's not guaranteed that a + b + c == a + c + b. On x86 there's also some additional weirdness where intermediate results of calculations are represented with 80-bits if they're kept in floating-point registers but reduced in precision to 64-bits if they're written to memory: https://en.wikipedia.org/wiki/Extended_precision. At the C++ or SQL levels you have very little control over which precision is used. Fixed-precision decimal will give you more predictable results if your application isn't tolerant to rounding errors.
... View more