Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Contributor

After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes (on all datanodes).

Based on the logs it looks like kerberos auth works.

 

The /etc/hadoop/conf/dfs.hosts.exclude is empty. I've still did the "yarn rmadmin -refreshNodes".

The dfs.hosts file has all the datanodes listed.

 

hadoop version:
Hadoop 2.6.0-cdh5.4.0
Subversion http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271
Compiled by jenkins on 2015-04-21T19:18Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.0.ja

 

Yarn NodeManager log:

2015-06-29 15:52:23,113 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2015-06-29 15:52:23,114 INFO org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: file_jvm thread interrupted.
2015-06-29 15:52:23,114 INFO org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: file_mapred thread interrupted.
2015-06-29 15:52:23,114 INFO org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: file_yarn thread interrupted.
2015-06-29 15:52:23,114 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2015-06-29 15:52:23,115 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2015-06-29 15:52:23,115 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from 17.bm-hadooph-datanode.sand-08.lax1.adnexus.net, Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:197)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:264)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from 17.bm-hadooph-datanode.sand-08.lax1.adnexus.net, Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:265)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:191)
... 6 more
2015-06-29 15:52:23,116 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at 17.bm-hadooph-datanode.sand-08.lax1/10.0.89.174

 

ResourceManager log from NameNode:

2015-06-29 15:51:45,866 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=refreshUserToGroupsMappings TARGET=AdminService RESULT=SUCCESS
2015-06-29 15:51:45,870 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=SUCCESS
2015-06-29 15:52:22,763 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for yarn/17.bm-hadooph-datanode.sand-08.lax1.adnexus.net@CORP.APPNEXUS.COM (auth:KERBEROS)
2015-06-29 15:52:22,885 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager from 17.bm-hadooph-datanode.sand-08.lax1.adnexus.net, Sending SHUTDOWN signal to the NodeManager.

10 REPLIES 10

Re: After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Super Collaborator

Check the allowed and excluded hosts lists, there has to be an issue in either file or the IP address for the host does not resolve to the hostnames in the allowed file.

And yes kerberos seems to be working.

 

You can check in the RM log after startup which hosts are allowed and which files are read for this by looking for "org.apache.hadoop.util.HostsFileReader" entries.

 

Wilfred

Highlighted

Re: After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Contributor

Thank you for looking into this.

I've checked the following files and they are ok:

dfs.hosts (all 11 datanodes are here)

dfs.hosts.exclude (empty)

mapred.hosts (all 11 datanodes are here)

mapred.hosts.exclude (empty)

 

grep 'HostsFileReader' yarn-yarn-resourcemanager-*****.log:

"2015-06-29 15:51:45,845 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to /etc/hadoop/conf/mapred.hosts

2015-06-29 15:51:45,845 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/mapred.hosts.exclude
2015-06-29 15:51:45,845 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
2015-06-29 15:51:45,845 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,845 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,845 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,845 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,845 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,845 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,846 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,846 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,846 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,846 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts
2015-06-29 15:51:45,846 INFO org.apache.hadoop.util.HostsFileReader: Adding ***** to the list of included hosts from /etc/hadoop/conf/mapred.hosts"

Re: After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Super Collaborator

Does the host resolve correctly to what is there in the file?

I have seen this happen before when the host resolves differently on the node than what you expect. Do you have just the names or the IP numbers in the allowed file or both for each host?

 

Wilfred

Re: After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Contributor

Yes it does reslove correctly. 

Both mapred.hosts and dfs.hosts only have hostnames in them.

 

Re: After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Super Collaborator

In the Cloudera Manager managed clusters we normally have the IP and hostname for a node in the configs.

Not sure if that makes a difference or not but the rack awareness documentation talks about IP numbers.

 

Wilfred

Re: After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Contributor

I got it working by replacing the datanodes to datanode.FQDN (ex: datanode1.xyz.com) in mapred.hosts file.

I was reading "Cloudera-security 5.4" doc and I did not see such requirement. Can you confirm or point me to KB/document referring to this?

 

Also, if mapred.hosts requires a FQDN, would it make sense to chnage the "dfs.hosts" with FQDN?

Re: After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Super Collaborator

The files should use the same name as the host uses to identify itself (see the book Hadoop Operations by Eric Sammer) and the form can be a list of IP addresses or hostnames in the form used within the cluster. So if you resolve to FQDN in the cluster the values in the file must be FQDN too. So it depends on what you have for a setup.

 

Wilfred

Re: After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Contributor

Understood.

There seems to be an another reason why i'm having this issue. In "/etc/nsswitch.conf" the hosts are being resolved by DNS first.

hosts:      dns files

 

If I swtch them around it does work (without changing /etc/hosts file).

Re: After adding kerberos changes to the cluster I'm unable to start NodeManager on DataNodes.

Super Collaborator

Good to hear that you have found the problem.

Hadoop is really picky when it comes to DNS and network. If you use Cloudera Manager we have a number of checks that we perform just to make sure that these kind of things work as expected.

 

Wilfred