Created on 04-04-2019 03:12 AM - edited 09-16-2022 07:17 AM
Hi Team,
I am trying to perform some testing on Impala so that I can analyze the performance of Impala query based on provided configuration.
I am using TPCDS queries.
I am making jdbc calls to fire queries. In order to change the configuration values for my query at run time for the current session, I am using Impala query options. I am analysing the query attribute values after execution.
In one of the jdbc url I am using "mem_limit" query option, I set its value as 3gb (mem_limit=3gb) But I can not see this value is applied to the current session.
I am getting below error- "Memory limit exceeded"
this is how I am using query option
jdbc:impala://host:21050 /tpcds_bin_partitioned_textfile_40;AuthMech=1;KrbRealm=test.com ;KrbHostFQDN=host;KrbServiceName=impala;mem_limit=3gb;";
But when I changed(mem_limit=3gb) the value from clouodera manager->Impala->configuration, it works fiine.
What wrong I am doing here.
Created 04-05-2019 08:53 AM
You can apply memory limits at two levels - at the Impala daemon level, which limits the total memory consumption of the process (in part so that it doesn't exceed the physical memory available, but also so that it leaves memory available for other services running on the host). You can (and should) also apply memory limits at the query level via the MEM_LIMIT query option (the one we were talking about). That controls how much of the process memory limit that a single query can get. E.g. if you're using admission control you can configure query memory limits that get applied to all queries in a resource pool.
It would be weird if running a query resulted in the impala daemon memory limit to change and I'm not sure what you would even expect to happen if you ran two queries at the same time.
I don't know if this helps, but I gave a talk recently that summarised some of the concepts here. There are slides linked from here - https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/73000
By the way, only allocating 1GB to each impala daemon is a bad idea for a production deployment - that's simply not enough to run a lot of more complex queries on larger data sets, particularly if you are running multiple concurrent queries. We have some sizing guidelines - https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.h...
Created 04-04-2019 01:49 PM
I jsut tested with ClouderaImpalaJDBC-2.6.4.1005 and it works for me with the following JDBC url. I can see in the query profile that it takes effect.
static final String DB_URL = "jdbc:impala://localhost:21050/functional_parquet;mem_limit=3gb";
From the profile:
Query Options (set by configuration): MEM_LIMIT=3221225472
Created 04-04-2019 10:22 PM
Thanks Tim,
This limit(3gb) only work when your IMPALAD's mem_limit is greater than 3gb.
I've increased the IMPALAD's mem_limit by invoking the rest api and by manually changing the configuration of Impala but in this way you have to restart the Impala server then only mem_limit will work.
I can not understand if IMPALAD's mem_limit is 1gb and if I pass the higher mem_limit (query_option) in jdbc URL then it won't work. What is the point of providing this query option.
(1) If my query needed 3gb memory and IMPALAD's mem_limit is 1 gb and I am passing mem_limit=3gb is JDBC url then it won't work. I've to change the mem_limit of IMPALAD and restart the server.
And
(2) If my query needed 500mb memory and IMPALAD's mem_limit is 1 gb then I don't need to pass mem_limit because in any case it is going to execute.
Hope you understood my point.
I can conclude that this query option can prevent query to take entire memory of IMPALAD, not for allocating the required memory.
Created on 04-05-2019 02:07 AM - edited 04-05-2019 02:07 AM
Yes, that is for limiting the query in order not to reduce accidental influence on other users (i.e. by occupying all available resources).
One more point: impala may have default query memory limits set, so you may wish to overwrite it.
Created 04-05-2019 08:53 AM
You can apply memory limits at two levels - at the Impala daemon level, which limits the total memory consumption of the process (in part so that it doesn't exceed the physical memory available, but also so that it leaves memory available for other services running on the host). You can (and should) also apply memory limits at the query level via the MEM_LIMIT query option (the one we were talking about). That controls how much of the process memory limit that a single query can get. E.g. if you're using admission control you can configure query memory limits that get applied to all queries in a resource pool.
It would be weird if running a query resulted in the impala daemon memory limit to change and I'm not sure what you would even expect to happen if you ran two queries at the same time.
I don't know if this helps, but I gave a talk recently that summarised some of the concepts here. There are slides linked from here - https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/73000
By the way, only allocating 1GB to each impala daemon is a bad idea for a production deployment - that's simply not enough to run a lot of more complex queries on larger data sets, particularly if you are running multiple concurrent queries. We have some sizing guidelines - https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.h...
Created 04-10-2019 05:02 AM
Thank you very much Tim for providing this insight.
I have assumption that MEM_LIMIT option is asking for that amount of space for query.