Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

impala query memory limit

avatar
Contributor

Hello,

 

I ran an impala query which failed due to memory limit exceeded error. I notice that I can increase the per query memory limitation by setting the "MEM_LIMIT" at the query level (ref: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_mem_limit.html). Also, there's an impalad daemon level memory limit which sets the mem limit for the process and queries that runs by that daemon as coordinator.

 

I have a couple of questions regarding these 2 limits (the query mem limit and the daemon mem limit):

1) If I "set mem_limit=yyyy;" before I run the query, I can see these lines from the query profile:

....
Query Options (non default): MEM_LIMIT=3145728
Plan:

...

But if I don't explicitly set that option, it's not in the query profile, then how/where could I get that value? Is there a CM rest api for retrieving that?

 

2) From CM rest api, is there a way to get the Impala process total memory and what's the max available memory to allocate to the query? I assume I can't allocate more than the impala process total memory, and if I do, it won't help in allocating more memory for running queries on that impalad host.

 

Thanks in advance,

S.

 

 

 

1 ACCEPTED SOLUTION

avatar

1) If you didn't set a memory limit for the query then the query may expand up to the process memory limit. I.e. the query memory limit is effectively process memory limit. This is a pretty bad configuration for concurrent queries since the queries end up fighting it out for memory.

 

2) To answer your CM question directly, you can get the relevant metrics from the timeseries API. tcmalloc_physical_bytes_reserved_across_impalads is the process consumption and mem_tracker_process_limit_across_impalads is the limit. If you paste this into the Chart Builder you can see the averages of the two: SELECT tcmalloc_physical_bytes_reserved_across_impalads, mem_tracker_process_limit_across_impalads WHERE entityName = "IMPALA-1" AND category = SERVICE

 

I'm wondering though if setting up admission control with resource pools and default query memory limits would solve your problem better thana custom solution: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_admission.html

View solution in original post

2 REPLIES 2

avatar

1) If you didn't set a memory limit for the query then the query may expand up to the process memory limit. I.e. the query memory limit is effectively process memory limit. This is a pretty bad configuration for concurrent queries since the queries end up fighting it out for memory.

 

2) To answer your CM question directly, you can get the relevant metrics from the timeseries API. tcmalloc_physical_bytes_reserved_across_impalads is the process consumption and mem_tracker_process_limit_across_impalads is the limit. If you paste this into the Chart Builder you can see the averages of the two: SELECT tcmalloc_physical_bytes_reserved_across_impalads, mem_tracker_process_limit_across_impalads WHERE entityName = "IMPALA-1" AND category = SERVICE

 

I'm wondering though if setting up admission control with resource pools and default query memory limits would solve your problem better thana custom solution: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_admission.html

avatar
Contributor

Hi Tim,

 

Thanks for the information.

 

A minor observation:

 

I found that "tcmalloc_physical_bytes_reserved_across_impalads" and "mem_tracker_process_limit_across_impalads" are for v5.3.x (ref: https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cm_metrics_impala.html) and "tcmalloc_physical_bytes_reserved" and "mem_tracker_process_limit" are for v5.7.x and above (https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cm_metrics_impala_daemon.html). I'm using v5.10, however, when I try to do the Chart Build in CM I can only find the previous 2 metrics though (ie, "tcmalloc_physical_bytes_reserved_across_impalads" and "mem_tracker_process_limit_across_impalads" ). 

 

Also from the CM rest api, it seems like those 2 sets of metrics are the same but just different data aggregation format in the returned json?

 

Regards,

S.