Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5804 | 12-18-2020 01:46 PM | |
3736 | 12-16-2020 12:11 PM | |
2648 | 12-07-2020 01:47 PM | |
1894 | 12-07-2020 09:21 AM | |
1225 | 10-14-2020 11:15 AM |
11-16-2016
03:13 PM
If you're using impala-shell, you can use the "summary;" command. Otherwise it's accessible through the Impala debug web pages (typically http://the-impala-server:25000)
... View more
11-15-2016
07:22 PM
I've opened IMPALA-4492. Thanks Tim. -m
... View more
10-13-2016
03:20 PM
Good to hear! Please feel free to mark it as solved to make it easier for others to find.
... View more
09-27-2016
09:50 AM
1 Kudo
Thanks for the data point :). We're tracking the parallelisation work here: https://issues.cloudera.org/browse/IMPALA-3902 . It's probably going to get enabled in phases - we may have parallelisation for aggregations before joins for example.
... View more
08-15-2016
09:54 AM
This is a known issue that we're actively working on: https://issues.cloudera.org/browse/IMPALA-2567 Your analysis is accurate. Part of the problem is the number of connections and the other part is the # of threads per connection. You may be able to change some operating system config settings to increase limits here (depending on which limit you're hitting). In order to reduce the # of tcp conncetions required you would either need to reduce the number of fragments or reduce the number of node executing the query. You could reduce the # of fragments by breaking up the query into smaller queries. E.g. creating temporary tables with the results of some of the subqueries. You could also try executing the query on a single node by setting num_nodes=1 if the data size is small enough that this makes sense. I suspect your query is too large for that to work, but it's hard to tell (that's a huge query plan!)
... View more
08-05-2016
05:35 PM
I'm not the most knowledgeable person about this part of the code, but what you're saying is correct. One of the likely causes of long wait times is if the receiver is consuming data slower than the sender is sending it.
... View more
06-30-2016
07:05 PM
1 Kudo
We're working on it and it should be in the next release, but unfortunately we're not aware of a good way to simulate the same results.
... View more
06-28-2016
07:53 AM
The only way to do this with zero work would be to use a view. http://www.cloudera.com/documentation/enterprise/latest/topics/impala_create_view.html Otherwise you do have to run the queries as part of your data pipeline as you mentioned.
... View more
06-24-2016
12:25 PM
That probably makes sense if the bottleneck is evaluating the where clause. If those extra rows are filtered out in the join, then the gain is limited, since you should filter out the extra rows during the scan or when evaluating the simple join condition. Our scans are multithreaded too, so sometimes if the join is the bottleneck, making the scans do more work doesn't slow down the query overall.
... View more
06-17-2016
08:30 PM
Thank you for explaining it. Its a function call. Changing it and will see the impact. Will come back with results...
... View more