Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Impala Memory limit exceeded for a very simple query

Impala Memory limit exceeded for a very simple query

Rising Star

I have a 5 node cluster on which I have configured impala. I have also downloaded the movie lens movies data (1 million ratings). 

 

when I issue this simple join query, impala is not able to execute it because it runs short of memory

 

That's very disapponting because my cluster in total has 4096 * 5 GB RAM

 

Can you look in this error and tell me how can I make this query succeed?

 

[hd1home:21000] > select movie_title, avg(ratings.rating) from movies join ratings on movies.movie_id=ratings.movie_id where crime=1 group by movie_title order by avg(ratings.rating) desc;
Query: select movie_title, avg(ratings.rating) from movies join ratings on movies.movie_id=ratings.movie_id where crime=1 group by movie_title order by avg(ratings.rating) desc
WARNINGS: Memory limit exceeded
Query did not have enough memory to get the minimum required buffers.


Backend 1:Memory Limit Exceeded
Query(7b4529ef6bf40489:b56f296217145b8) Limit: Consumption=400.57 MB
Fragment 7b4529ef6bf40489:b56f296217145b9: Consumption=8.00 KB
UDFs: Consumption=0
EXCHANGE_NODE (id=8): Consumption=0
DataStreamRecvr: Consumption=0
Block Manager: Limit=372.00 MB Consumption=368.50 MB
Fragment 7b4529ef6bf40489:b56f296217145ba: Consumption=14.26 MB
UDFs: Consumption=0
SORT_NODE (id=4): Consumption=0
AGGREGATION_NODE (id=7): Consumption=14.25 MB
EXCHANGE_NODE (id=6): Consumption=0
DataStreamRecvr: Consumption=0
DataStreamSender: Consumption=4.00 KB
Fragment 7b4529ef6bf40489:b56f296217145bb: Consumption=386.30 MB
UDFs: Consumption=0
AGGREGATION_NODE (id=3): Consumption=14.25 MB
HASH_JOIN_NODE (id=2): Consumption=372.01 MB
HDFS_SCAN_NODE (id=0): Consumption=0
EXCHANGE_NODE (id=5): Consumption=0
DataStreamRecvr: Consumption=32.00 KB
DataStreamSender: Consumption=4.00 KB
WARNING: The following tables are missing relevant table and/or column statistics.
default.movies,default.ratings
Backend 2:Field movie_id is missing from file and does not have a default value

 

 

13 REPLIES 13

Re: Impala Memory limit exceeded for a very simple query

Master Collaborator

You could try setting the following Impalad startup options:

 

--enable_partitioned_aggregation=false
--enable_partitioned_hash_join=false
 
I cannot say for sure whether it will solve your problem, but you can give it a try.
 
That said, in terms of memory your cluster is rather far from the recommended Impala hardware configuration.

Re: Impala Memory limit exceeded for a very simple query

Rising Star

Thanks for your reply.

 

Can you tell me which file I should edit to add these optoins?

 

sorry I am a little new to Impala.

Re: Impala Memory limit exceeded for a very simple query

Master Collaborator

Are you using Cloudera Manager?

Re: Impala Memory limit exceeded for a very simple query

Explorer
I tried setting up this property, using cloudera manager under -->
Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve) with values -->
--enable_partitioned_aggregation=false
--enable_partitioned_hash_join=false

But it didn't seem to work.

Re: Impala Memory limit exceeded for a very simple query

Master Collaborator

I wouldn't recommend using that setting on recent Impala versions unless you really know what you are doing.

 

Memory usage has improved a lot, particularly in Impala 2.3 onwards.

Re: Impala Memory limit exceeded for a very simple query

Explorer

Hey Tim, 

 

I'm using Impala Shell v2.3.0-cdh5.5.2.

 

When running a very simple query I get the following error. 

 

WARNINGS:
Memory limit exceeded
Query did not have enough memory to get the minimum required buffers in the block manager.

 

Memory Limit Exceeded
Query(874e069213f1cca0:7195be58cb72fa3) Limit: Limit=20.00 GB Consumption=56.70 MB
Fragment 874e069213f1cca0:7195be58cb72fa4: Consumption=24.00 KB
EXCHANGE_NODE (id=8): Consumption=0
DataStreamRecvr: Consumption=0
Block Manager: Limit=16.00 GB Consumption=0
Fragment 874e069213f1cca0:7195be58cb72faa: Consumption=84.88 KB
ANALYTIC_EVAL_NODE (id=6): Consumption=0
ANALYTIC_EVAL_NODE (id=5): Consumption=0
SORT_NODE (id=4): Consumption=0
ANALYTIC_EVAL_NODE (id=3): Consumption=0
ANALYTIC_EVAL_NODE (id=2): Consumption=0
SORT_NODE (id=1): Consumption=0
EXCHANGE_NODE (id=7): Consumption=0
DataStreamRecvr: Consumption=59.24 KB
DataStreamSender: Consumption=1.64 KB
Fragment 874e069213f1cca0:7195be58cb72fb3: Consumption=56.60 MB
HDFS_SCAN_NODE (id=0): Consumption=56.57 MB
DataStreamSender: Consumption=15.96 KB

 

On running the explain plan and seeing query profile, the memory requirements look modest. 

 

Let me know if you need some other inputs.

 

-Thanks

Re: Impala Memory limit exceeded for a very simple query

Master Collaborator

Are there other queries running concurrently? It's possible that the memory is in use by other queries.


Also, do you have a process memory set on impala?

Re: Impala Memory limit exceeded for a very simple query

Explorer

No, there aren't any queries being run in parallel. 
before running the query tried setting mem_limit 20GB;

 

set mem_limit=20g; 

Re: Impala Memory limit exceeded for a very simple query

Master Collaborator

I just saw your other post: http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-memory-issue-Query-did-not-have-... I missed the details about your resource management configuration - the problem is likely related to that. I've forwarded this post to a colleague who is more familiar with the resource management.

 

If you're trying to use the YARN integration via LLAMA, you should be aware that it's no longer supported as of CDH5.5. See http://www.cloudera.com/documentation/enterprise/release-notes/topics/impala_incompatible_changes.ht... for more details.