I am new in Impala, and we always encounter an issue on Impala Active Client Connection getting exceeded. We first set it from 64 to 128, it gets resolved but after a few weeks we still get the error. My question is how can we detect the number of connections in real time aside from checking it in Active Frontend API Connections Chart? How to get the exact number of clients and queries it run which causes it to reach its max limit? How are connection being closed (i.e. are they closed once an impala query is completed or is there some other process), how can we monitor those open queries other than in queries tab?
Hi, I know i am answering to this post after a long gap still it may help to others.
As per your post even after you increasing the number of connections from 64 to 128 you are still facing the connection issues.
Here first we need to check all the opened connections are closed or not if these are not closed then the issue may occur even if you increase it to 252.
Go to CM -> Impala -> Charts Library -> check "Active Frontend API connection"
Monitor the cluster if many users are using the cluster or not if you think connections are high and users are not using the cluster then the opened sessions/connections are not closed after the use.
set the "idle session timeout" in the Impala configurations, by setting this if a user opened a session and performed any operation and leave the session it will become idle session and this session will be closed by setting this timeout.
Same if you are running the impala queries from Hue you need to set the session time out something like below in the Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini
As like if you are using ODBC/JDBC clients you need to set the timeouts at the client side.
If you would like to check the number of connections/sessions those are Active and inactive Go to CM -> Impala -> click on any Impala daemon -> Go to Daemon WEB UI -> click on "\sessions" tab
Thank you for your response. We already set the 'idle session timeout' and 'query_timeout' setting in impala but didn't work. I have checked on the server's network side and run netstat -an | grep ESTABLISHED | grep <impala port> and see a lot of active client connections which are not being closed properly after query execution. I believe the issue is with the client as they are using DBeaver and we found out that every time the dev team open a new query tab it opens a new connection in impala and the more query tabs they open the more active connections established on the server side. In addition, it doesn't get terminated once their queries completed and seems that the connection closes only if they close their query tabs or close the DBeaver sql client. As workaround we restart impala to reset those active connection but we find it very risky and unsafe esp if we do it in production clusters.
Thanks and regards,
you have mentioned that connection will be opened till user closes the query tab from the DBeaver tool, I think it is expected suppose if you run the queries from Hue and you haven't set the session time out at Hue side the connection/session will never close until you close the hue query tab or logout from hue session.
So it is the responsible of the client to close the session after query get finishes.
Can you check with DBeaver team to configure the session timeout at client side? I think configuring this will help to close the sessions at impala side.