Datanode is not connecting to namenode

Hello everyone,


I did a cdh 5.6 install using the cloudera manager installer bin on a four node vm cluster. I was able to bring up the hdfs service and the name node role instance and 3 data node role instance. But the data node is having connectivity issue with the nama node. So Name node sees the data nodes as dead and shows 0 bytes available in the cluster.


When I looked at the datanode logs, I find the following error:


Problem connecting to server:


Block pool ID needed, but service not yet registered with NN
java.lang.Exception: trace
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(
	at org.apache.hadoop.hdfs.server.datanode.DataNode.getNamenodeAddresses(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at sun.reflect.misc.Trampoline.invoke(
	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at sun.reflect.misc.MethodUtil.invoke(
	at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(
	at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(
	at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(
	at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(
	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(
	at com.sun.jmx.mbeanserver.PerInterface.getAttribute(
	at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(
	at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(
	at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(
	at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(
	at org.apache.hadoop.jmx.JMXJsonServlet.doGet(
	at javax.servlet.http.HttpServlet.service(
	at javax.servlet.http.HttpServlet.service(
	at org.mortbay.jetty.servlet.ServletHolder.handle(
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
	at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
	at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
	at org.apache.hadoop.http.NoCacheFilter.doFilter(
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
	at org.apache.hadoop.http.NoCacheFilter.doFilter(
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
	at org.mortbay.jetty.servlet.ServletHandler.handle(
	at org.mortbay.jetty.servlet.SessionHandler.handle(
	at org.mortbay.jetty.handler.ContextHandler.handle(
	at org.mortbay.jetty.webapp.WebAppContext.handle(
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(
	at org.mortbay.jetty.handler.HandlerWrapper.handle(
	at org.mortbay.jetty.Server.handle(
	at org.mortbay.jetty.HttpConnection.handleRequest(
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(
	at org.mortbay.jetty.HttpParser.parseNext(
	at org.mortbay.jetty.HttpParser.parseAvailable(
	at org.mortbay.jetty.HttpConnection.handle(
	at org.mortbay.thread.QueuedThreadPool$


This error is repeated I guess every time it tries to heart beat.


I checked the netstat on the name node host.  It is listening. I am not sure why the data node log shows problem connecting to this IP and post.


tcp 0 0* LISTEN 989 6824028 -


Some of the blog post I read asked to check if the /etc/hosts has a mapping for the host name to the loop back address. Here is the content of my hosts file on the master. I commented both the lines and restarted hdfs through Cloudera manager. But, didnt fix the issue.


# localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6


ANy help is very appreciated. I have been sitting with this issue for a long time now. Looks like I am missing something simple but couldnt figure out. Thank you!





I see the following exception in the name node:


PriviledgedActionException as:cloudera-scm (auth:SIMPLE) Access denied for user cloudera-scm. Superuser privilege is required


IPC Server handler 8 on 8022, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.versionRequest from Call#596 Retry#0 Access denied for user cloudera-scm. Superuser privilege is required
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkSuperuserPrivilege(
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.versionRequest(
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.versionRequest(
	at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
	at org.apache.hadoop.ipc.RPC$
	at org.apache.hadoop.ipc.Server$Handler$
	at org.apache.hadoop.ipc.Server$Handler$
	at Method)
	at org.apache.hadoop.ipc.Server$

 Looks like this is the root cause. Not sure why it doesnt have the privilege though. In fact, cloudera-scm has passowrd less sudo on all the hosts. ( that was a left over permission from the last installtion in single user mode)



I did a complete uninstall, cleaned up and removed all the folders related to hadoop and cloudera. Its workign fine now. Earlier, I did a single user mode install and uninstalled it to do the default installation. Looks like there were lingering files and folders that were causing the issue.

Does anyone has actual solution to this problem not re-installation of the cluster. I am facing the same error for one of my DataNode, it becomes unavailable after restart and throws similar exceptions/errors. 


PS: I cannot re-install because its my production cluster. Looking for help.




same problem i have now can you help please


Did you get a solution to this issue? Would be grateful for your response.