Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NodeManagers fail to start after Kerberos deployment

avatar
Explorer

Hi All,

I've hit the wall again and need to reach out for community wisdom on this issue.  I had a fully functioning CDH5.3.2 3-node (yarn) cluster...  and then...  I configured it for Kerberos.  

 

I've done this successfully before using CDH4.2.1 and CDH5.1.2MRv1...  so it's not like I've never done this before.

 

What I'm getting now is the proverbial messaging;

9:31:21.867 AMINFOorg.apache.hadoop.yarn.server.nodemanager.NodeManager
registered UNIX signal handlers for [TERM, HUP, INT]
9:31:24.736 AMINFOorg.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService
Using state database at /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recovery
9:31:24.871 AMINFOorg.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger
Recovering log #18
9:31:24.894 AMINFOorg.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger
Delete type=0 #18

9:31:24.894 AMINFOorg.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger
Delete type=3 #17

9:31:24.910 AMINFOorg.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService
Loaded NM state version info 1.0
9:31:25.357 AMWARNorg.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
Exit code from container executor initialization is : 24
ExitCodeException exitCode=24: Configuration file ../etc/hadoop/container-executor.cfg not found.

	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
	at org.apache.hadoop.util.Shell.run(Shell.java:455)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:180)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:209)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
9:31:25.372 AMINFOorg.apache.hadoop.yarn.server.nodemanager.ContainerExecutor 
9:31:25.373 AMINFOorg.apache.hadoop.service.AbstractService
Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:186)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:209)
	... 3 more
Caused by: ExitCodeException exitCode=24: Configuration file ../etc/hadoop/container-executor.cfg not found.

	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
	at org.apache.hadoop.util.Shell.run(Shell.java:455)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:180)
	... 4 more
9:31:25.391 AMWARNorg.apache.hadoop.service.AbstractService
When stopping the service NodeManager : java.lang.NullPointerException
java.lang.NullPointerException
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:161)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:273)
	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
	at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
	at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
9:31:25.392 AMFATALorg.apache.hadoop.yarn.server.nodemanager.NodeManager
Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:186)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:209)
	... 3 more
Caused by: ExitCodeException exitCode=24: Configuration file ../etc/hadoop/container-executor.cfg not found.

	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
	at org.apache.hadoop.util.Shell.run(Shell.java:455)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:180)
	... 4 more

 

It appears there have been other postings on this; Please see 

1. "enable kerberos,nodemanager can not start Exit code from container executor initialization is : 24" &

2. "Issue with starting Yarn after deploying kerberos on Cloudera Quickstart VM (CDH 5.3.0)"

 

and I've tried 2 solutions;  

1. ensuring /etc & /etc/hadoop have 755 permissions

and 

2. stopping the cluster, deleting the /var/lib/hadoop-yarn/yarn-nm-recovery directory. & restarting the cluster.

and nothing seems to resolve this issue.

 

This appears to be some sort of permissions issue around the Kerberization so I need help asap.  Thanks in advance for your wisdom and advice.

mit Freundlichen Grüßen (with Friendly Greetings),

     Jan

 

1 ACCEPTED SOLUTION

avatar
Explorer

Hi dlo,

Thank you for your help; you actually led me in another direction and I determined there were 2 "linked" (no pun intended) issues;  Seems that the /etc/hadoop/conf symlink was pointing at another broken symlink, /etc/alternatives/hadoop-conf, that was pointing to a non-existant directory on the 2 nodes where nodemanager was failing.  I corrected the /etc/alternatives/hadoop-conf symlink from /etc/hadoop/conf.cloudera.mapred (which doesn't exist) to /etc/hadoop/conf.cloudera.yarn.

 

Then I deployed client configuration yet again and restarted the cluster...  and voile`  Problem solved.  When I checked back through everything, I was able to see that the timestamp updated on the  /etc/hadoop/conf.cloudera.yarn/topology.map & /etc/hadoop/conf.cloudera.yarn/topology.py files which was (at some level) a confirmation that the configs had been successfully re-deployed.

 

Hope this helps and thank you again for your help.

mit Freundlichen Grüßen (with Friendly Greetings),

     Jan

View solution in original post

4 REPLIES 4

avatar
Did you try deploying client configuration, then restarting the node managers?

The container executor interacts with a weird way with alternatives and process restarts. Deployiing CC followed by YARN restart clears up these issues, if they're the cause.

avatar
Explorer

Hi dlo,

Thank you for getting back to me so quickly.  I believe I'd done that Friday night, but that said, I'll retry it.  Before doing that, I have a question about the deploy client configuration and the deploy Kerberos client configuration command;

First, I assume that I run one of these against the cluster and not just the node managers, correct?

Second, do I run both or just the deploy Kerberos client configuration?

Thank you in advance for your advice on this and look forward to your reply.

mit Freundlichen Grüßen (with Friendly Greetings),

     Jan

avatar
Hi Jan,

Good question. I was talking about deploying client configuration, which will update symlinks for /etc/hadoop/conf (which is where the container executor lives). Deploy Kerberos client configuration will just update /etc/krb5.conf. If that looks fine, no need to run it.

It's important to run the deploy client configuration command before starting / restarting YARN because the symlinks need to be correct before starting YARN. Issues with the symlinks mostly come up when adding new hosts or changing from MR to YARN, since in these cases /etc/hadoop/conf might not be a symlink to the YARN client configuration.

Once you start YARN, on each host with a NodeManager you should see /etc/hadoop/conf/container-executor.cfg get created.

Thanks,
Darren

avatar
Explorer

Hi dlo,

Thank you for your help; you actually led me in another direction and I determined there were 2 "linked" (no pun intended) issues;  Seems that the /etc/hadoop/conf symlink was pointing at another broken symlink, /etc/alternatives/hadoop-conf, that was pointing to a non-existant directory on the 2 nodes where nodemanager was failing.  I corrected the /etc/alternatives/hadoop-conf symlink from /etc/hadoop/conf.cloudera.mapred (which doesn't exist) to /etc/hadoop/conf.cloudera.yarn.

 

Then I deployed client configuration yet again and restarted the cluster...  and voile`  Problem solved.  When I checked back through everything, I was able to see that the timestamp updated on the  /etc/hadoop/conf.cloudera.yarn/topology.map & /etc/hadoop/conf.cloudera.yarn/topology.py files which was (at some level) a confirmation that the configs had been successfully re-deployed.

 

Hope this helps and thank you again for your help.

mit Freundlichen Grüßen (with Friendly Greetings),

     Jan