07-27-2015 09:55 PM
I am using CDH 5.3.2. I am unable to set value for yarn.nodemanager.pmem-check-enabled through UI. I could add following to ResourceManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml and restarted it
<property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property>
However, I still see my apps getting killed due to physical limits being breached:
2015-07-27 18:53:46,528 [AMRM Callback Handler Thread] INFO HoyaAppMaster.yarn (HoyaAppMaster.java:onContainersCompleted(847)) - Container Completion for containerID=container_1437726395811_0116_01_000002, state=COMPLETE, exitStatus=-104, diagnostics=Container [pid=36891,containerID=container_1437726395811_0116_01_000002] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.8 GB of 2.1 GB virtual memory used. Killing container.
07-29-2015 08:32 PM
I would strongly recommend to not set that to false. It will prevent the NM to keep control over the containers.
If you are running out of physical memory in a container make sure that the JVM heap size is small enough to fit in the container.
The container size should be large enough to contain:
- JVM heap
- permanent generation for the JVM
- any off-heap allocations
In most cases an overhead of between 15%-30% of the JVM heap will suffice. Your job configuration should include the proper JVM and container settings.
Some jobs will require more and some will require less overhead.
If you really want to make the change: the pmem-check-enabled is a NM setting and you need to set it snippet for the NM.
07-29-2015 08:43 PM
I'd agree about not setting it to false. That's my idea too.
The main reason to use that setting is to be able to do some functional testing without getting into tuning as yet.
So, is there a way I can set this property through UI?
07-29-2015 08:46 PM
Set it through the NodeManager yarn-site.xml configuration snippet.
You used the ResourceManager snippet and the check is not performed on that service that is why it did not work for you.
04-06-2016 10:43 PM
First, the attribute name looks like a typo - you guys mean to say yarn.nodemanager.vmem-check-enabled , no?
Second, your recommend contracdicts the specific advice given in your own 2014 engineering blog, Apache Hadoop YARN: Avoiding 6 Time-Consuming "Gotchas". If that is no longer valid, please mark the article accordingly.
04-07-2016 08:52 AM
As Sumit said there are two settings: vmem, set to false, (virtual memory) and pmem, set to true, (physical memory).
The blog is still correct and the change is for vmem and the way the virtual memory allocator works on Linux. For the pmem setting: that is the "real" memory and enforces the container restrictions. If you turn that off your task that runs in a container could just take all memory on the node. It leaves the NodeManager helpless to enforce the container sizing you have set and you expect the applications (and your end users) to behave in the proper way.
04-07-2016 11:04 AM
Thanks for the clarification. As others pointed out, CM doesn't list yarn.nodemanager.vmem-check-enabled as a configurable parameter, but seems to default it to false (I can see it in my Oozie action job metadata).
But then, this means the error "Diagnostics report from attempt_1459358870111_0185_m_000054_3: Container [pid=18971,containerID=container_1459358870111_0185_01_000210] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.7 GB of 2.1 GB virtual memory used. Killing container." wasn't triggered by virtual memory overage, but actually by physical memory? Which parameter should I tune?
As experiment, I am setting mapreduce.map.memory.mb = 3000 manually in the failing Hive2 action. It runs slowly but seems to work. Job counters show max physical usage per task at ~2.4GB, committed heap at 2.3GB, and virtual memory at 4.3GB. Reducer consumption all trail mapper by varying amounts. Do you have a better suggestion?
04-07-2016 11:18 AM
Related topic: Jobs fail in Yarn with out of Java heap memory error
... where your colleague bcwalrus said, "That [yarn.nodemanager.vmem-check-enabled] shouldn't matter though. You said that the job died due to OOME. It didn't die because it got killed by NM." Is it what happened here, too?
And what's the reason to set mapreduce.*.java.opts.max.heap in addition to mapreduce.*.memory.mb? Wouldn't it just introduce more potential conflict w/o much benefit?