we have 65 nodes impala cluster and we notice very high number (around 4000 connections) of establisched TCP connections in all the impala nodes (see screenshot) and that number increases over the time and never going down until we restart impala. After the restart it will increase again. We have for quiet sometimes instability with impala e.g. crash, SSL connection error etc. and wonder whether that high number of TCP connections is normal or expected. We use haproxy as load balancer in front of impala.
This can happen when there is high concurrent load on the cluster. What is the exact error message are you receiving on the failed queries?
Can you post the profile of the query.
We have pretty high number of concurrent queries running with different kind of profiles. But is it normal that the number of TCP connections stay high and even increase all the time? Shouldn't it go down and then up and then down again and so forth?
Regarding the errors I open a case already to the support. We got for example "can't connect to Impala: [unixODBC][Cloudera][ThriftExtension] (3) Error occurred while contacting server: SSL_read: Resource temporarily unavailable. The connection has been configured to use a SASL mechanism for authentication. This error might be due to the server is not using SASL for authentication. (SQL-HY000)" and sometimes impala just crashed.
There is an existing jira related to TCP connection increase and it will be addressed in future versions of impala
But this wouldnt cause crash of impala daemon. The core-dump of the daemons will help in understanding the reason for crash.
You might receive connection timeout on backend port 22000 when there is high load which is addressed in https://issues.cloudera.org/browse/IMPALA-4135