Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

container exit code 137 and memory usage is limited

avatar
Expert Contributor

My cluster memory is 147GB and I get this error when the server has not used it's entire memory.

 

I can see there is memory free and yet my jobs get killed with this error.  There is no error in logs and I don't get any error using dmesg command or in /var/log/messages

 

 

Also, it happens randomly and on any of the nodes. Please suggest. Been trying to get in touch with Cloudera sales support but no luck and it's urgent.

3 REPLIES 3

avatar
Mentor
The container memory usage limits are driven not by the available host memory but by the resource limits applied by the container configuration.

For example if you've configured a map task to use 1 GiB of pmem, but its actual code at runtime uses more than 1 GiB, it will get killed. The common resolution to this would be to grant it more than 1 GiB, so it may do its higher-memory work without exceeding what it is given. Another resolution in certain cases would be to investigate if the excess memory use is justified, which can be discussed with the developer of the application.

The randomness may be dependent on the amount of data the container code processes and what it ends up doing with it.

Have you tried increasing the memory properties of containers via fields such as "Map Task Memory", "Reduce Task Memory" if its MR jobs you are having issues with, or pass higher values to --executor-memory arguments with spark-submit if its Spark jobs instead.

This is all assuming you are seeing an error of the below form, since the relevant log isn't shared in your post:

… Container killed by YARN for exceeding memory limits. 1.1 GB of 1 GB physical memory used …

avatar
Expert Contributor

@Harsh J

These are not spark jobs but hive and sqoop jobs I am running. These randomly get killed throughout the day, with the same configuration sometimes run and sometimes don't.

 

mapreduce.map.memory.mb: 0GB
description says:
 if it is specified as 0, the amount of physical memory to the request is inferred from Map Task Maximum Heap Size and Heap to Container Size Ratio. 
 
mapreduce.reduce.memory.mb: 0GB
 
mapreduce.job.heap.memory-mb.ratio: 1GB
Client Java Heap Size in Bytes: 1GB
yarn.nodemanager.resource.memory-mb: 45GB
yarn.scheduler.increment-allocation-mb: 1GB
yarn.scheduler.maximum-allocation-mb: 4GB
yarn.nodemanager.resource.cpu-vcores: 12
 
Number of worker nodes: 3
yarn.scheduler.maximum-allocation-vcores: 2
 
I am using AWS m4.4x instances for worker nodes. I have tried tweaking these values but am I doing something horribly incorrect? Please suggest
 
 

avatar
Expert Contributor

@Harsh J : Could you please respond? It's a production cluster and it is disturbing our workflows when we run into this error