Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Datanode is not connecting to namenode (CDH 5.14.0)

avatar
Explorer

Hi there,

 

Datanode is not connecting to namenode after restart.

 

2018-02-22 13:26:21,691 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
        at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342)
        at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:320)
        at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1301)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
        at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
        at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)
        at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)
        at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)
        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
        at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
        at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)

 

and

 

2018-02-22 13:26:21,691 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
        at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342)
        at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:320)
        at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1301)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
        at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
        at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)
        at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)
        at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)
        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
        at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
        at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)

 

 

Preamble: after a successful manual update CDH(cdh 5.9.3 -> 5.14.0) and after one week of successful cluster operation, my datanodes started losing connection to the namenode one-by-one. I do not know why. The HDDs were full, I guess. After a while, the connection was restored.

 

But I stopped one datanode at that moment and cann't start it anymore: datanode is not connecting to namenode (see errors above). Never.

 

Experts, do you ever encounter this problem? Please share your experiences, thank you very much.

 

1 ACCEPTED SOLUTION

avatar
Explorer

Hi!

 

I was able to find a solution myself: hdfs-site.xml

    <property>
        <name>dfs.disk.balancer.enabled</name>
-        <value>true</value>
+        <value>false</value>
    </property>

This enables the diskbalancer feature on a cluster. By default, disk balancer is disabled.

 

I still receive the error message:

2018-03-07 15:05:00,177 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
...
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

But now datanode is connecting to namenode and works fine.

View solution in original post

7 REPLIES 7

avatar
Explorer

Hi!

 

I was able to find a solution myself: hdfs-site.xml

    <property>
        <name>dfs.disk.balancer.enabled</name>
-        <value>true</value>
+        <value>false</value>
    </property>

This enables the diskbalancer feature on a cluster. By default, disk balancer is disabled.

 

I still receive the error message:

2018-03-07 15:05:00,177 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
...
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

But now datanode is connecting to namenode and works fine.

avatar
Contributor

Hi Koc,

 

this seems to be pretty much interesting, as the exception based on the code seems to be a result of a race condition, as the getDiskBalancerStatus call is the following in the code:

@Override // DataNodeMXBean
public String getDiskBalancerStatus() {
try {
return this.diskBalancer.queryWorkStatus().toJsonString();
} catch (IOException ex) {
LOG.debug("Reading diskbalancer Status failed. ex:{}", ex);
return "";
}
}

So that NullPointerException can happen when the diskBalancer is null, or if queryWorkStatus() returns a null.

 

queryWorkStatus() throws an IOException when the disk Balancer is not enabled, and that is why disabling the disk balancer fixes the issue. Otherwise queryWorkStatus seems to always return a reference.

 

This is why I suspect a race condition that causes the diskBalancer reference to be null in the DataNode object when the getDiskBalancerStatus method is called.

As the getDiskBalancerStatus method is exposed to the JMX interface, this method is called, when the DataNode's JMX interface is being queried, and should not prevent the DataNode startup.

 

So this seems to be something that should not fail the DataNode startup, do you still have the startup logs for this issue, when the DataNode startup failed? Is there anything else that is reported as an error, or fatal?

If you do have the DataNode standard error output (on a CDH cluster it is in /var/run/cloudera-scm-agent/process/xxx-DATANODE/logs folders) for a failed start, then that might as well contain some other traces about the problem, would you please check it, it would be nice to track this down, and if it is a bug fix it.

 

Thanks!

Istvan

avatar
Explorer

Hi @pifta,

 

I should clarify that the DataNode starts successfully (as OS process). The main issue in that the Datanode is not connecting to a namenode.

I use rpm-based cdh distrib. The DataNode standard error output (/var/log/hadoop-hdfs/*.out) is clean.
Here is the Datanode log:

 

1. Working Datanode started losing connection to the namenode (before I do anything):

2018-02-22 13:24:52,774 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opReplaceBlock BP-1891900421-xx.xxx.xx.xx-1410884922197:blk_1702076774_1100453422491 received exception java.io.IOException: Connection reset by peer
2018-02-22 13:24:52,774 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error writing reply back to /<my_namenode_ip>:38536
2018-02-22 13:24:52,774 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: <my_datanode_ip>:50010:DataXceiver error processing REPLACE_BLOCK operation  src: /<my_namenode_ip>:38536 dst: /<my_datanode_ip>:50010
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:901)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1149)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:261)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:748)

(appears many times)

 

 

 

2. After the Datanode is restarted:

2018-02-22 13:25:03,944 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
< as usual, skip >

2018-02-22 13:25:30,454 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute VolumeInfo of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException: Storage not yet initialized
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
        at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342)
...

Caused by: java.lang.NullPointerException: Storage not yet initialized
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.getVolumeInfo(DataNode.java:2905)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
...

2018-02-22 13:26:00,766 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute VolumeInfo of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException: Storage not yet initialized
...

2018-02-22 13:26:03,159 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1891900421-xx.xxx.xx.xx-1410884922197

[skip]

2018-02-22 13:26:21,691 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException
2018-02-22 13:26:21,691 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException

[skip]

2018-02-22 13:26:21,871 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean
2018-02-22 13:26:21,872 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Volume reference is released.
2018-02-22 13:26:21,872 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-1891900421-xx.xxx.xx.xx-1410884922197

[skip]

2018-02-22 13:26:28,178 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-1891900421-xx.xxx.xx.xx-1410884922197 on volume /data/disk12/dfs/dn/current...
2018-02-22 13:26:30,770 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException
2018-02-22 13:27:00,198 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException
2018-02-22 13:27:30,098 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException

< repeated every 30 sec >

2018-02-22 14:03:30,114 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException
2018-02-22 14:04:00,093 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException
2018-02-22 14:04:30,102 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException

< until I restart the Datanode again >

2018-02-22 14:04:39,339 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM

There are no any messages in the log between DiskBalancerStatus ERRORs. Neither INFO nor WARN or CRITICAL.

 

 


3. After I turned off the disk.balancer and restarted the Datanode:

2018-02-22 15:27:04,499 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
< as usual, skip >

2018-02-22 15:27:07,349 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-1891900421-xx.xxx.xx.xx-1410884922197 on volume /data/disk12/dfs/dn/current...
2018-02-22 15:27:30,326 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException
2018-02-22 15:28:00,164 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException

< and now the startup process has continue >

2018-02-22 15:28:22,019 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1891900421-xx.xxx.xx.xx-1410884922197 on volume /data/disk6/dfs/dn/current: 74670ms

[skip]

2018-02-22 15:28:22,910 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-1891900421-xx.xxx.xx.xx-1410884922197 on volume /data/disk5/dfs/dn, after more than 504 hour(s)
...
2018-02-22 15:28:23,056 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic Directory Tree Verification scan starting at 1519318745056ms with interval of 21600000ms
...
2018-02-22 15:28:23,063 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1891900421-xx.xxx.xx.xx-1410884922197 (Datanode Uuid 3cf7f365-ab70-4ece-8bcb-5a410ef4a6fd) service to <my_namenode_hostname>/<my_namenode_ip>:8020 beginning handshake with NN

< the Datanode normal operation >

 

avatar
Contributor

Were you able to resolve this issue? What did you do to fix the problem?

avatar
Explorer

As I've said before

    <property>
        <name>dfs.disk.balancer.enabled</name>
-        <value>true</value>
+        <value>false</value>
    </property>

It does not really fix the problem but It's a kind of workaround.

avatar
Contributor

I'm not sure if it applies to your problem but maybe this blog will help you: http://gbif.blogspot.com/2015/05/dont-fill-your-hdfs-disks-upgrading-to.html

 

avatar
New Contributor

This enables the diskbalancer feature on a cluster. By default, disk balancer is disabled.

then why is your config 

+        <value>false</value>