Member since
01-26-2016
11
Posts
0
Kudos Received
0
Solutions
08-15-2016
09:54 AM
This is a known issue that we're actively working on: https://issues.cloudera.org/browse/IMPALA-2567 Your analysis is accurate. Part of the problem is the number of connections and the other part is the # of threads per connection. You may be able to change some operating system config settings to increase limits here (depending on which limit you're hitting). In order to reduce the # of tcp conncetions required you would either need to reduce the number of fragments or reduce the number of node executing the query. You could reduce the # of fragments by breaking up the query into smaller queries. E.g. creating temporary tables with the results of some of the subqueries. You could also try executing the query on a single node by setting num_nodes=1 if the data size is small enough that this makes sense. I suspect your query is too large for that to work, but it's hard to tell (that's a huge query plan!)
... View more
08-05-2016
05:35 PM
I'm not the most knowledgeable person about this part of the code, but what you're saying is correct. One of the likely causes of long wait times is if the receiver is consuming data slower than the sender is sending it.
... View more
07-26-2016
08:08 AM
Here's the link to the profile https://dl.dropboxusercontent.com/u/13650224/profile.txt
... View more
02-04-2016
12:50 PM
No, it does not affect the timeline.
... View more
01-27-2016
04:09 PM
You are most likely running into this bug with the aggregation: https://issues.cloudera.org/browse/IMPALA-2352 We fixed it in CDH5.5/Impala 2.3 but the change wasn't backported because it was deemed too risky for a maintenance release.
... View more