We have a situation where yarn is killing llap application containers and then requesting to launch new ones. This causes a brief unavailability in llap daemons and running applications fail because of this.
When we reviewed some of the container logs, saw following message:
2022-01-26 03:15:48,339 [Component dispatcher] ERROR instance.ComponentInstance - [COMPINSTANCE llap-0 : container_e127_1642817883045_7610_01_000002]: container_e127_1642817883045_7610_01_000002 completed. Reinsert back to pending list and requested a new container.
exitStatus=-104, diagnostics=[2022-01-26 03:15:47.314]Container [pid=8434,containerID=container_e127_1642817883045_7610_01_000002] is running 665411584B beyond the 'PHYSICAL' memory limit. Current usage: 75.6 GB of 75 GB physical memory used; 77.6 GB of 157.5 GB virtual memory used. Killing container.
I don't understand from where this 75.6GB of 75 GB limit is coming from? I have tried increasing the memory per llap daemon but it doesn't help either. Parameters:
1. Memory allocated for all yarn containers on a node is = 95 GB
2. llap memory per daemon = 75 Gb
3. memory cache per daemon = 20 Gb
4. llap_daemon_overhead= 6 GB
Hive servers2 or hive-interacative-server logs don't provide much detail either. What other properties I can fine tune to fix this? Any help is appreciated.
@rpathak Thank you for your response! I have tried increasing memory per llap daemon upto 87 GB currently, but every time containers are being killed with the same reason, physical memory limit being reached.
Do you think I need to increase the memory even more?