Member since
08-17-2016
1
Post
1
Kudos Received
0
Solutions
02-21-2017
03:38 AM
1 Kudo
General description: Premounted cgroups on ubuntu 14.04 breakes startup of NodeManager. Software version: OS: ubuntu14.04 Linux kernal: 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 GNU/Linux Cloudera manager: 5.8.4-1 Cloudera agent: 5.8.4-1 CDH parcel:5.8.2 Detailed description: Ubuntu 14.04 mounts cgroups automatically on startup after installation of cgroup-lite (for e.x. docker.io and libvirt-bin depends on it) to /sys/fs/cgroup/ like this: cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,relatime,hugetlb) cloudera-scm-agent succefully detects those and reports about it to the log: [03/Feb/2017 16:45:15 +0000] 1903 MainThread agent INFO Agent starting as pid 1903 user root(0) group root(0). [21/Feb/2017 10:09:05 +0000] 14054 MainThread agent INFO At least one outstanding cgroup; retaining cgroup mounts
[21/Feb/2017 10:09:08 +0000] 20837 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/cgroups
[21/Feb/2017 10:09:08 +0000] 20837 MainThread cgroups INFO Found existing subsystem cpu at /sys/fs/cgroup/cpu
[21/Feb/2017 10:09:08 +0000] 20837 MainThread cgroups INFO Found existing subsystem cpuacct at /sys/fs/cgroup/cpuacct
[21/Feb/2017 10:09:08 +0000] 20837 MainThread cgroups INFO Found existing subsystem memory at /sys/fs/cgroup/memory
[21/Feb/2017 10:09:08 +0000] 20837 MainThread cgroups INFO Found existing subsystem blkio at /sys/fs/cgroup/blkio
[21/Feb/2017 10:09:08 +0000] 20837 MainThread cgroups INFO Found cgroups subsystem: cpu
[21/Feb/2017 10:09:08 +0000] 20837 MainThread cgroups INFO cgroup pseudofile /sys/fs/cgroup/cpu/cpu.rt_runtime_us does not exist, skipping
[21/Feb/2017 10:09:08 +0000] 20837 MainThread cgroups INFO Found cgroups subsystem: cpuacct
[21/Feb/2017 10:09:08 +0000] 20837 MainThread cgroups INFO Found cgroups subsystem: memory
[21/Feb/2017 10:09:08 +0000] 20837 MainThread cgroups INFO Found cgroups subsystem: blkio
[21/Feb/2017 10:09:08 +0000] 20837 MainThread agent INFO Found cgroups capabilities: {'has_memory': True, 'default_memory_limit_in_bytes': -1, 'default_memory_soft_limit_in_bytes': -1, 'writable_cgroup_dot_procs': True, 'default_cpu_rt_runtime_us': -1, 'has_cpu': True, 'default_blkio_weight': 1000, 'default_cpu_shares': 1024, 'has_cpuacct': True, 'has_blkio': True} The ubuntu's default policies autolocate process to the default location under dedicated user's folder /user/0.user/ : 1862 ? Ss 0:27 /usr/lib/cmf/agent/build/env/bin/python /usr/lib/cmf/agent/build/env/bin/supervisord
1872 ? S 0:00 \_ python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-listener -l /var/log/cloudera-scm-agent/cmf_listener.log /run/cloudera-scm-agent/events
2676 ? Sl 2:46 \_ /usr/lib/jvm/java-8-oracle//bin/java -Dproc_datanode -Xmx1000m -Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop22245 ? Sl 0:00 \_ python2.7 /usr/lib/cmf/agent/build/env/bin/flood
21957 ? Ssl 0:01 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-agent --package_dir /usr/lib/cmf/service --agent_dir /var/run/cloudera-scm-agent --lib_dir /var/lib/cloudera-scm-agent --logfile /var/log/cloudera-scm-agent/cloudera-r
#cat /proc/21957/cgroup
11:name=systemd:/user/0.user/5.session
10:hugetlb:/user/0.user/5.session
9:perf_event:/user/0.user/5.session
8:blkio:/user/0.user/5.session
7:freezer:/user/0.user/5.session
6:devices:/user/0.user/5.session
5:memory:/user/0.user/5.session
4:cpuacct:/user/0.user/5.session
3:cpu:/user/0.user/5.session
2:cpuset:/
#cat /proc/1862/cgroup
11:name=systemd:/user/0.user/c1.session
10:hugetlb:/user/0.user/c1.session
9:perf_event:/user/0.user/c1.session
8:blkio:/user/0.user/c1.session
7:freezer:/user/0.user/c1.session
6:devices:/user/0.user/c1.session
5:memory:/cloudera
4:cpuacct:/user/0.user/c1.session
3:cpu:/cloudera
2:cpuset:/ The corresponded cpu folder structure looks like this after datanode started: # ll /sys/fs/cgroup/cpu/user/0.user/c1.session/
total 0
drwxr-xr-x 3 root root 0 Feb 21 09:21 ./
drwxr-xr-x 5 root root 0 Feb 20 16:59 ../
drwxr-xr-x 2 root root 0 Feb 20 15:10 757-hdfs-DATANODE/
-rw-r--r-- 1 root root 0 Feb 20 15:10 cgroup.clone_children
--w--w--w- 1 root root 0 Feb 20 15:10 cgroup.event_control
-rw-r--r-- 1 root root 0 Feb 20 15:10 cgroup.procs
-rw-r--r-- 1 root root 0 Feb 20 15:10 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Feb 20 15:10 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Feb 20 15:10 cpu.shares
-r--r--r-- 1 root root 0 Feb 20 15:10 cpu.stat
-rw-r--r-- 1 root root 0 Feb 20 15:10 notify_on_release
-rw-r--r-- 1 root root 0 Feb 20 15:10 tasks Next when I try to start YARN node manager from cloudera manager: Feb 21, 9:21:49.447 AM INFO org.apache.hadoop.service.AbstractService
Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:221)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:514)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:561)
Caused by: java.io.IOException: Not able to enforce cpu weights; cannot write to cgroup at: /sys/fs/cgroup/cpu
at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:502)
at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:154)
at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:137)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:215)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:219)
... 3 more
Feb 21, 9:21:49.486 AM DEBUG org.apache.hadoop.service.AbstractService
Service: NodeManager entered state STOPPED In cloudera manager yarn.nodemanager.linux-container-executor.cgroups.hierarchy set to '/hadoop-yarn' I created a /sys/fs/cgroup/cpu/hadoop-yarn cgroup manually and gave yarn user permissions 777. But got the error again. I straced the nodemanager java process and got as last system call this. which is [pid 11431] access("/sys/fs/cgroup/cpu/u/s/e/r/0/./u/s/e/r/c/4/./s/e/s/s/i/o/n/hadoop-yarn", W_OK) = -1 ENOENT (No such file or directory) This looks really strange. It seems something wrong with replacements. Here is yarn/yarn.sh ["nodemanager"] strerr: + echo CONF_DIR=/run/cloudera-scm-agent/process/797-yarn-NODEMANAGER
+ echo CMF_CONF_DIR=/etc/cloudera-scm-agent
+ EXCLUDE_CMF_FILES=('cloudera-config.sh' 'httpfs.sh' 'hue.sh' 'impala.sh' 'sqoop.sh' 'supervisor.conf' '*.log' '*.keytab' '*jceks')
++ printf '! -name %s ' cloudera-config.sh httpfs.sh hue.sh impala.sh sqoop.sh supervisor.conf '*.log' yarn.keytab '*jceks'
+ find /run/cloudera-scm-agent/process/797-yarn-NODEMANAGER -type f '!' -path '/run/cloudera-scm-agent/process/797-yarn-NODEMANAGER/logs/*' '!' -name cloudera-config.sh '!' -name httpfs.sh '!' -name hue.sh '!' -name impala.sh '!' -name sqoop.sh '!' -name supervisor.conf '!' -name '*.log' '!' -name yarn.keytab '!' -name '*jceks' -exec perl -pi -e 's#{{CMF_CONF_DIR}}#/run/cloudera-scm-agent/process/797-yarn-NODEMANAGER#g' '{}' ';'
Can't open /run/cloudera-scm-agent/process/797-yarn-NODEMANAGER/container-executor.cfg: Permission denied.
+ perl -pi -e 's#{{CGROUP_GROUP_CPU}}#u/s/e/r///0/./u/s/e/r///4/./s/e/s/s/i/o/n#g' /run/cloudera-scm-agent/process/797-yarn-NODEMANAGER/yarn-site.xml Checked furthen and found that the bug is in agent.py at /usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.4-py2.7.egg/cmf/agent.py at method update_process_environment_for_cgroups at line 3318 group = '/'.join(group) Which do next for group string [21/Feb/2017 11:23:41 +0000] 33551 MainThread agent INFO Set ENV from agent cgroups before '/'.join(group) CPU user/0.user/4.session
[21/Feb/2017 11:23:41 +0000] 33551 MainThread agent INFO Set ENV from agent cgroups after '/'.join(group) CPU u/s/e/r///0/./u/s/e/r///4/./s/e/s/s/i/o/n Is it intended to be that way? Can you fix it? Potentially this affects not only NodeManager but Impala too. Thanks, Alexander Yasnogor
... View more
Labels:
- Labels:
-
Apache YARN
-
Cloudera Manager