Created on 06-10-202303:10 AM - edited on 06-13-202312:13 AM by VidyaSargur
Summary
Are you having issues with more queries being handled by a single Impala Coordinator?
Does this eventually lead to OOM scenarios?
Let’s consider you have 3 Impala Coordinators within your cluster and notice that there are queries that skew onto any one of the Impala Coordinators and overwhelm it.
Note how one of the Impala Coordinators in the above example has 73 running queries, and the other 2 have relatively few.
Investigation
Source IP Persistence
To ascertain why any Impala Coordinator can skew the number of running queries that are active on it, look at the way the proxy is set up to handle incoming queries.
‘Source IP Persistence’ means setting up sessions from the same IP address to always go to the same coordinator. This setting is required when setting up high availability with Hue. It is also required to avoid the Hue message ‘results have expired’, which indicates when a query is sent to the cluster on one coordinator but the result doesn’t return to the user via the same coordinator/Hue Server.
Example HAProxy Configuration for Source IP Persistence
Example setup of Hue-Impala connectivity within /etc/haproxy/haproxy.cfg as follows:
listen impala-hue :21052
mode tcp
stats enable
balance source
timeout connect 5000ms
timeout queue 5000ms
timeout client 3600000ms
timeout server 3600000ms
# Impala Nodes
server impala-coordinator-001.fqdn impala-coordinator-001.fqdn:21050 check
server impala-coordinator-002.fqdn impala-coordinator-002.fqdn:21050 check
server impala-coordinator-003.fqdn impala-coordinator-003.fqdn:21050 check
Now let’s review what can impact the overall connection count into an Impala Coordinator: Hue, Hive & Impala timeout settings.
Example Timeout Settings
The following settings might mimic what you have currently set within your Hue, Hive & Impala services.
Hue
Hive
Impala
Proposed Timeout Settings
Whilst the actual settings will vary cluster by cluster, we recommend moving away from the default settings and setting all of the idle parameters to 2 hours across the board in all 3 services: Hue, Hive & Impala.
This is an initial goal of introducing timeouts whilst monitoring the user experience. The ultimate best practice in this area is to head toward having:
Idle Query Timeouts of 300 seconds (or 5 minutes)
Idle Session Timeouts of 600 seconds (or 10 minutes)
NOTE - all of the parameters being discussed relate to ‘idle’ sessions and queries; in other words, the user has to have left either the session or query in an idle state before the idle parameters will kick in. No active session or query will be captured by this change in the service(s) behavior (s).