Support Questions
Find answers, ask questions, and share your expertise

CLOSE_WAIT status choking Impala connection over 21050 port

CLOSE_WAIT status choking Impala connection over 21050 port

Hi Community,

We are using AWS Network Load Balancer to balance out the traffic between 6 impala daemons. 

Recently we started facing issues where 2 Impala daemons won't receive query and just hangs the connection over port 21050. 

On further investigation we found that there were around 5k approx CLOSE_WAIT statuses for the connection between the LB and Impala daemon. 

 

tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:64135     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:58169     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:64652     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:62075     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:52393     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:41447     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:47034     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:49452     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:28327     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:52498     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:21168     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:40079     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:35664     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:4191      CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:14935     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:63036     CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:5158      CLOSE_WAIT  11084/impalad       
tcp        1      0 10.XXX.XXX.68:21050     10.XXX.XXX.80:60134     CLOSE_WAIT  11084/impalad  

 

Everytime I restart the daemons, the CLOSE_WAIT disappears and the connection establishes successfully but after few minutes these CLOSE_WAIT statuses piles up and chokes the connection again.

Our cluster is Kerberized and TLS-SSL enabled. The NLB is internal and there is only one user using it through JDBC driver.

 

I'm stuck with this issue for over a week now, any suggestion will be very much helpful.

 

Thank you.

Don't have an account?