Created on 05-18-2015 10:25 AM - edited 09-16-2022 02:29 AM
Hi All,
I've hit the wall again and need to reach out for community wisdom on this issue. I had a fully functioning CDH5.3.2 3-node (yarn) cluster... and then... I configured it for Kerberos.
I've done this successfully before using CDH4.2.1 and CDH5.1.2MRv1... so it's not like I've never done this before.
What I'm getting now is the proverbial messaging;
9:31:21.867 AM | INFO | org.apache.hadoop.yarn.server.nodemanager.NodeManager | registered UNIX signal handlers for [TERM, HUP, INT] |
9:31:24.736 AM | INFO | org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService | Using state database at /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recovery |
9:31:24.871 AM | INFO | org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger | Recovering log #18 |
9:31:24.894 AM | INFO | org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger | Delete type=0 #18 |
9:31:24.894 AM | INFO | org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger | Delete type=3 #17 |
9:31:24.910 AM | INFO | org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService | Loaded NM state version info 1.0 |
9:31:25.357 AM | WARN | org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor | Exit code from container executor initialization is : 24 ExitCodeException exitCode=24: Configuration file ../etc/hadoop/container-executor.cfg not found. at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:180) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:209) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509) |
9:31:25.372 AM | INFO | org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor | |
9:31:25.373 AM | INFO | org.apache.hadoop.service.AbstractService | Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509) Caused by: java.io.IOException: Linux container executor not configured properly (error=24) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:186) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:209) ... 3 more Caused by: ExitCodeException exitCode=24: Configuration file ../etc/hadoop/container-executor.cfg not found. at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:180) ... 4 more |
9:31:25.391 AM | WARN | org.apache.hadoop.service.AbstractService | When stopping the service NodeManager : java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:161) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:273) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509) |
9:31:25.392 AM | FATAL | org.apache.hadoop.yarn.server.nodemanager.NodeManager | Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509) Caused by: java.io.IOException: Linux container executor not configured properly (error=24) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:186) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:209) ... 3 more Caused by: ExitCodeException exitCode=24: Configuration file ../etc/hadoop/container-executor.cfg not found. at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:180) ... 4 more |
It appears there have been other postings on this; Please see
1. "enable kerberos,nodemanager can not start Exit code from container executor initialization is : 24" &
2. "Issue with starting Yarn after deploying kerberos on Cloudera Quickstart VM (CDH 5.3.0)"
and I've tried 2 solutions;
1. ensuring /etc & /etc/hadoop have 755 permissions
and
2. stopping the cluster, deleting the /var/lib/hadoop-yarn/yarn-nm-recovery directory. & restarting the cluster.
and nothing seems to resolve this issue.
This appears to be some sort of permissions issue around the Kerberization so I need help asap. Thanks in advance for your wisdom and advice.
mit Freundlichen Grüßen (with Friendly Greetings),
Jan
Created 05-18-2015 11:27 AM
Hi dlo,
Thank you for your help; you actually led me in another direction and I determined there were 2 "linked" (no pun intended) issues; Seems that the /etc/hadoop/conf symlink was pointing at another broken symlink, /etc/alternatives/hadoop-conf, that was pointing to a non-existant directory on the 2 nodes where nodemanager was failing. I corrected the /etc/alternatives/hadoop-conf symlink from /etc/hadoop/conf.cloudera.mapred (which doesn't exist) to /etc/hadoop/conf.cloudera.yarn.
Then I deployed client configuration yet again and restarted the cluster... and voile` Problem solved. When I checked back through everything, I was able to see that the timestamp updated on the /etc/hadoop/conf.cloudera.yarn/topology.map & /etc/hadoop/conf.cloudera.yarn/topology.py files which was (at some level) a confirmation that the configs had been successfully re-deployed.
Hope this helps and thank you again for your help.
mit Freundlichen Grüßen (with Friendly Greetings),
Jan
Created 05-18-2015 10:29 AM
Created 05-18-2015 10:40 AM
Hi dlo,
Thank you for getting back to me so quickly. I believe I'd done that Friday night, but that said, I'll retry it. Before doing that, I have a question about the deploy client configuration and the deploy Kerberos client configuration command;
First, I assume that I run one of these against the cluster and not just the node managers, correct?
Second, do I run both or just the deploy Kerberos client configuration?
Thank you in advance for your advice on this and look forward to your reply.
mit Freundlichen Grüßen (with Friendly Greetings),
Jan
Created 05-18-2015 10:47 AM
Created 05-18-2015 11:27 AM
Hi dlo,
Thank you for your help; you actually led me in another direction and I determined there were 2 "linked" (no pun intended) issues; Seems that the /etc/hadoop/conf symlink was pointing at another broken symlink, /etc/alternatives/hadoop-conf, that was pointing to a non-existant directory on the 2 nodes where nodemanager was failing. I corrected the /etc/alternatives/hadoop-conf symlink from /etc/hadoop/conf.cloudera.mapred (which doesn't exist) to /etc/hadoop/conf.cloudera.yarn.
Then I deployed client configuration yet again and restarted the cluster... and voile` Problem solved. When I checked back through everything, I was able to see that the timestamp updated on the /etc/hadoop/conf.cloudera.yarn/topology.map & /etc/hadoop/conf.cloudera.yarn/topology.py files which was (at some level) a confirmation that the configs had been successfully re-deployed.
Hope this helps and thank you again for your help.
mit Freundlichen Grüßen (with Friendly Greetings),
Jan