Support Questions

sim6 · ‎05-21-2018

My cluster memory is 147GB and I get this error when the server has not used it's entire memory.

I can see there is memory free and yet my jobs get killed with this error. There is no error in logs and I don't get any error using dmesg command or in /var/log/messages

Also, it happens randomly and on any of the nodes. Please suggest. Been trying to get in touch with Cloudera sales support but no luck and it's urgent.

Harsh J · ‎05-21-2018

The container memory usage limits are driven not by the available host memory but by the resource limits applied by the container configuration.

For example if you've configured a map task to use 1 GiB of pmem, but its actual code at runtime uses more than 1 GiB, it will get killed. The common resolution to this would be to grant it more than 1 GiB, so it may do its higher-memory work without exceeding what it is given. Another resolution in certain cases would be to investigate if the excess memory use is justified, which can be discussed with the developer of the application.

The randomness may be dependent on the amount of data the container code processes and what it ends up doing with it.

Have you tried increasing the memory properties of containers via fields such as "Map Task Memory", "Reduce Task Memory" if its MR jobs you are having issues with, or pass higher values to --executor-memory arguments with spark-submit if its Spark jobs instead.

This is all assuming you are seeing an error of the below form, since the relevant log isn't shared in your post:

… Container killed by YARN for exceeding memory limits. 1.1 GB of 1 GB physical memory used …

sim6 · ‎05-21-2018

@Harsh J:

These are not spark jobs but hive and sqoop jobs I am running. These randomly get killed throughout the day, with the same configuration sometimes run and sometimes don't.

mapreduce.map.memory.mb: 0GB

description says:

if it is specified as 0, the amount of physical memory to the request is inferred from Map Task Maximum Heap Size and Heap to Container Size Ratio.

mapreduce.reduce.memory.mb: 0GB

mapreduce.job.heap.memory-mb.ratio: 1GB

Client Java Heap Size in Bytes: 1GB

yarn.nodemanager.resource.memory-mb: 45GB

yarn.scheduler.increment-allocation-mb: 1GB

yarn.scheduler.maximum-allocation-mb: 4GB

yarn.nodemanager.resource.cpu-vcores: 12

Number of worker nodes: 3

yarn.scheduler.maximum-allocation-vcores: 2

I am using AWS m4.4x instances for worker nodes. I have tried tweaking these values but am I doing something horribly incorrect? Please suggest

sim6 · ‎05-24-2018

@Harsh J : Could you please respond? It's a production cluster and it is disturbing our workflows when we run into this error

Cloudera Community

Support Questions

container exit code 137 and memory usage is limited