Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala resource Isolation

avatar
Explorer

Hi guys,

    Is there a way to isolate impala daemon resources in one cluster?

    eg. 6 daemons in 1 Impala Cluster. Is there a method to isolate 2 daemons' resources for testing, 4 daemons' resources for production? Testing queries only use the memory resource in their daemons.

    HAProxy can forward requests to different daemons, but memory resources are not isolated. If query from testing takes too much memories, production query may failed while executing.
    The only way I can figure out is setup another impala cluster. I know impala provides admission control, what if the query execution plan estimate memory cost wrong, the actual memory cost is far more than the estimation(eg. 3G VS 150G)? What parameters limit the actual memory usage?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi,

 

Yes, Impala daemons will use the memory during the execution. Your understanding is correct.

In the attached screenshot i can see the corrupted stats for the tables involved in the query, We recommend to run "compute stats" on the tables which is having partial stats and rerun the queries otherwise it will  generates bad execution plan and uses more memory than expected.

 

Regards,

Chethan YM

View solution in original post

3 REPLIES 3

avatar
Super Collaborator

Hi,

You can restrict the amount of memory Impala reserves during query execution by specifying the -mem_limit option. If you set mem_limit=2gb, The query will not use more than 2gb even if it needs.

 

If you cannot set the memory at the time of execution every-time i think you can create a new resource pool under impala admission control. While creating resource pool you can mention Min and Maximum Query Memory Limit and do not use this resource pool for production queries. 

 

set request_pool="pool-name" run the test queries.

 

Regards,

Chethan YM

 

avatar
Explorer

Is memory distributed on average across all nodes during daemon query execution? 

https://impala.apache.org/docs/build/html/topics/impala_admission.html

In "Clamp MEM_LIMIT Query Option" section, IF "Maximum Query Memory Limit" for the pool set to 10GB, then the query can use 50GB for the calculation is (Maximum Query Memory Limit * number of Impala nodes).

 

eg. Query estimates for 2.5GB per-host IMG_3130.JPG, even there's enough memory left in other nodes, query will failed IF only 1 node out of 2.5GB.

 

 

avatar
Super Collaborator

Hi,

 

Yes, Impala daemons will use the memory during the execution. Your understanding is correct.

In the attached screenshot i can see the corrupted stats for the tables involved in the query, We recommend to run "compute stats" on the tables which is having partial stats and rerun the queries otherwise it will  generates bad execution plan and uses more memory than expected.

 

Regards,

Chethan YM