Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Single Query (with 253 Plan Fragments) Causes TPC Connection Issue

avatar
Explorer

Hi 

We have a single query consists of 253 plan fragments on a 43 clusters. We encountered an issue saying that "couldn't get a client for cdh-datanode-010.xxxxx.storage:22000" in the middle of the execution. I'm wondering is this because of the dedicated tcp connections required by each channel? The query consists of 212 HDFS SCAN NODE on each impalad node. Each of them is broadcasting/shuffling data to other 42 nodes, which requires, I think, 42 channels/data stream sender/scan node/per server. If each of them requires a tcp connection, then it would be 377496 connections all together, is this correct???

 

If this is the case, would you have any optimization suggestion to this query?

 

We only have a partial profile for this query as it stops in the middle of execution

https://dl.dropboxusercontent.com/u/13650224/impala_sql_profile_2d42c9a80da6e983_faf86d52fb685b80.sq...

 

Any comments and suggestion will be appreciated. 

Thanks

 

We are using Impala 2.3

1 ACCEPTED SOLUTION

avatar

This is a known issue that we're actively working on: https://issues.cloudera.org/browse/IMPALA-2567

 

Your analysis is accurate. Part of the problem is the number of connections and the other part is the # of threads per connection. You may be able to change some operating system config settings to increase limits here (depending on which limit you're hitting).

 

In order to reduce the # of tcp conncetions required you would either need to reduce the number of fragments or reduce the number of node executing the query.

 

You could reduce the # of fragments by breaking up the query into smaller queries. E.g. creating temporary tables with the results of some of the subqueries. 


You could also try executing the query on a single node by setting num_nodes=1 if the data size is small enough that this makes sense. I suspect your query is too large for that to work, but it's hard to tell (that's a huge query plan!)

View solution in original post

1 REPLY 1

avatar

This is a known issue that we're actively working on: https://issues.cloudera.org/browse/IMPALA-2567

 

Your analysis is accurate. Part of the problem is the number of connections and the other part is the # of threads per connection. You may be able to change some operating system config settings to increase limits here (depending on which limit you're hitting).

 

In order to reduce the # of tcp conncetions required you would either need to reduce the number of fragments or reduce the number of node executing the query.

 

You could reduce the # of fragments by breaking up the query into smaller queries. E.g. creating temporary tables with the results of some of the subqueries. 


You could also try executing the query on a single node by setting num_nodes=1 if the data size is small enough that this makes sense. I suspect your query is too large for that to work, but it's hard to tell (that's a huge query plan!)