03-21-2018 05:03 PM
We install our software component as an add-on service using a parcel and CSD. And we have enabled CGroups with cpu shares and memory hard limit. And when we set static memory allocation we saw the service cgroup created with the right values. This worked fine with Cloudera Manager 5.9.1.
We recently upgraded to Cloudera Manager 5.14.1 and the cgroups for memory does not work anymore.
It seems the configure_group function in cgroups.py is expecting a json object with child items, but the passed in json object is flat and has no child items. Hence the error. As a workaround, I edited the _do_mem_resources function in process.py and passed the entire memory_resources object.
self.agent.cg_manager.configure_group(path, "memory", mem_resources)
With this change, we did not see the startup error and the cgroup setting was modified at startup, which then exposed issue #2.
2. We allocated 64GB for the service per node in the static allocation page. But we found a really large value in the cgroup’s memory.limit_in_bytes value. (72057594037927936)
We see the right value 68719476736 bytes shown in the cloudera-scm-agent.log but the final value on the cgroup is this large number 72057594037927936.
[21/Mar/2018 04:28:15 +0000] 24060 MainThread cgroups INFO Reconfiguring cgroup pseudofile /var/run/cloudera-scm-agent/cgroups/memory/<my service role>/memory.limit_in_bytes with value 68719476736
The /usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.14.1-py2.6.egg/cmf/cgroup.py script that writes out the cgroup entry assumes the value is in MB and so multiplies the value by 1024 * 1024 and writes this large value 72057594037927936 to the cgroup file.
I then compared the behavior with Cloudera Manager 5.9.x where this was all working.
It looks like in 5.9.x everything was stored internally as MB. When we set 64GB in static allocation page, we see the proc.json had the memory hard limit set to 65536. And then the cgroup.py multiplied it by 1024*1024 to compute the actual bytes.
In 5.14.1 of CM, it looks like everything is stored as bytes as we see in the proc.json. But the cgroups.py is still trying to treat the values as MB and convert it to bytes.
Are these 2 issues known issues? Is there a patch to address these?
Appreciate any help.
04-02-2018 02:33 PM
This is a known issue fixed in upcoming releases 5.14.2, and 5.15+. It is reported in versions as old as CM 5.12.1, though I'm not sure if that's the oldest affected version, but this would explain the issue you're seeing.