Post running queries on impala having joins / group by, I'm getting following error.
Memory limit exceeded
Query did not have enough memory to get the minimum required buffers in the block manager.
Memory Limit Exceeded
Query(874e069213f1cca0:7195be58cb72fa3) Limit: Limit=20.00 GB Consumption=56.70 MB
Fragment 874e069213f1cca0:7195be58cb72fa4: Consumption=24.00 KB
EXCHANGE_NODE (id=8): Consumption=0
Block Manager: Limit=16.00 GB Consumption=0
Fragment 874e069213f1cca0:7195be58cb72faa: Consumption=84.88 KB
ANALYTIC_EVAL_NODE (id=6): Consumption=0
ANALYTIC_EVAL_NODE (id=5): Consumption=0
SORT_NODE (id=4): Consumption=0
ANALYTIC_EVAL_NODE (id=3): Consumption=0
ANALYTIC_EVAL_NODE (id=2): Consumption=0
SORT_NODE (id=1): Consumption=0
EXCHANGE_NODE (id=7): Consumption=0
DataStreamRecvr: Consumption=59.24 KB
DataStreamSender: Consumption=1.64 KB
Fragment 874e069213f1cca0:7195be58cb72fb3: Consumption=56.60 MB
HDFS_SCAN_NODE (id=0): Consumption=56.57 MB
DataStreamSender: Consumption=15.96 KB
This I'm doing on a 10 node cdh5.5.2 cluster having Impala v2.3.0 configured. All 9 machines are 32GiB RAM with 1 NN 64GiB. Impala daemons are running on 9 slave nodes. I have allocated 22GB to RM / AM and 8GB to AM through cloudera manager. However, while running impala queries no other job - yarn / spark job is running, so memory should be available to impala.
Let me know how to take this forward.
This is almost certainly one of the known issues with Impala's previous YARN integration (via Llama). There were some fixes to that code in Impala 2.4 but you could still get this sort of unexpected failure. It's recommended that you do not use Impala on YARN, and instead to statically allocate resources to Impala for the time being. Cloudera Manager has a feature called "Static Service Pools" that can help with that, but it's not necessary. The CM documentation has more information.
We are working on improving Impala resource management, but it's going to take some time and we don't yet have a target release yet.