Created 02-26-2018 05:32 AM
Hi there,
Datanode is not connecting to namenode after restart.
2018-02-22 13:26:21,691 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342) at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:320) at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1301) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
and
2018-02-22 13:26:21,691 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342) at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:320) at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1301) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
Preamble: after a successful manual update CDH(cdh 5.9.3 -> 5.14.0) and after one week of successful cluster operation, my datanodes started losing connection to the namenode one-by-one. I do not know why. The HDDs were full, I guess. After a while, the connection was restored.
But I stopped one datanode at that moment and cann't start it anymore: datanode is not connecting to namenode (see errors above). Never.
Experts, do you ever encounter this problem? Please share your experiences, thank you very much.
Created 03-07-2018 05:08 AM
Hi!
I was able to find a solution myself: hdfs-site.xml
<property> <name>dfs.disk.balancer.enabled</name> - <value>true</value> + <value>false</value> </property>
This enables the diskbalancer feature on a cluster. By default, disk balancer is disabled.
I still receive the error message:
2018-03-07 15:05:00,177 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651) ... Caused by: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
But now datanode is connecting to namenode and works fine.
Created 03-07-2018 05:08 AM
Hi!
I was able to find a solution myself: hdfs-site.xml
<property> <name>dfs.disk.balancer.enabled</name> - <value>true</value> + <value>false</value> </property>
This enables the diskbalancer feature on a cluster. By default, disk balancer is disabled.
I still receive the error message:
2018-03-07 15:05:00,177 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651) ... Caused by: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
But now datanode is connecting to namenode and works fine.
Created 03-07-2018 07:55 AM
Hi Koc,
this seems to be pretty much interesting, as the exception based on the code seems to be a result of a race condition, as the getDiskBalancerStatus call is the following in the code:
@Override // DataNodeMXBean
public String getDiskBalancerStatus() {
try {
return this.diskBalancer.queryWorkStatus().toJsonString();
} catch (IOException ex) {
LOG.debug("Reading diskbalancer Status failed. ex:{}", ex);
return "";
}
}
So that NullPointerException can happen when the diskBalancer is null, or if queryWorkStatus() returns a null.
queryWorkStatus() throws an IOException when the disk Balancer is not enabled, and that is why disabling the disk balancer fixes the issue. Otherwise queryWorkStatus seems to always return a reference.
This is why I suspect a race condition that causes the diskBalancer reference to be null in the DataNode object when the getDiskBalancerStatus method is called.
As the getDiskBalancerStatus method is exposed to the JMX interface, this method is called, when the DataNode's JMX interface is being queried, and should not prevent the DataNode startup.
So this seems to be something that should not fail the DataNode startup, do you still have the startup logs for this issue, when the DataNode startup failed? Is there anything else that is reported as an error, or fatal?
If you do have the DataNode standard error output (on a CDH cluster it is in /var/run/cloudera-scm-agent/process/xxx-DATANODE/logs folders) for a failed start, then that might as well contain some other traces about the problem, would you please check it, it would be nice to track this down, and if it is a bug fix it.
Thanks!
Istvan
Created 03-12-2018 07:18 AM
Hi @pifta,
I should clarify that the DataNode starts successfully (as OS process). The main issue in that the Datanode is not connecting to a namenode.
I use rpm-based cdh distrib. The DataNode standard error output (/var/log/hadoop-hdfs/*.out) is clean. Here is the Datanode log:
1. Working Datanode started losing connection to the namenode (before I do anything):
2018-02-22 13:24:52,774 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opReplaceBlock BP-1891900421-xx.xxx.xx.xx-1410884922197:blk_1702076774_1100453422491 received exception java.io.IOException: Connection reset by peer 2018-02-22 13:24:52,774 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error writing reply back to /<my_namenode_ip>:38536 2018-02-22 13:24:52,774 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: <my_datanode_ip>:50010:DataXceiver error processing REPLACE_BLOCK operation src: /<my_namenode_ip>:38536 dst: /<my_datanode_ip>:50010 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:901) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1149) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:261) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:109) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:748)
(appears many times)
2. After the Datanode is restarted:
2018-02-22 13:25:03,944 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: < as usual, skip > 2018-02-22 13:25:30,454 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute VolumeInfo of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException: Storage not yet initialized at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342) ... Caused by: java.lang.NullPointerException: Storage not yet initialized at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at org.apache.hadoop.hdfs.server.datanode.DataNode.getVolumeInfo(DataNode.java:2905) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ... 2018-02-22 13:26:00,766 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute VolumeInfo of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException: Storage not yet initialized ... 2018-02-22 13:26:03,159 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1891900421-xx.xxx.xx.xx-1410884922197 [skip] 2018-02-22 13:26:21,691 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException 2018-02-22 13:26:21,691 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException [skip] 2018-02-22 13:26:21,871 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean 2018-02-22 13:26:21,872 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Volume reference is released. 2018-02-22 13:26:21,872 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-1891900421-xx.xxx.xx.xx-1410884922197 [skip] 2018-02-22 13:26:28,178 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-1891900421-xx.xxx.xx.xx-1410884922197 on volume /data/disk12/dfs/dn/current... 2018-02-22 13:26:30,770 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException 2018-02-22 13:27:00,198 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException 2018-02-22 13:27:30,098 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException < repeated every 30 sec > 2018-02-22 14:03:30,114 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException 2018-02-22 14:04:00,093 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException 2018-02-22 14:04:30,102 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException < until I restart the Datanode again > 2018-02-22 14:04:39,339 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
There are no any messages in the log between DiskBalancerStatus ERRORs. Neither INFO nor WARN or CRITICAL.
3. After I turned off the disk.balancer and restarted the Datanode:
2018-02-22 15:27:04,499 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: < as usual, skip > 2018-02-22 15:27:07,349 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-1891900421-xx.xxx.xx.xx-1410884922197 on volume /data/disk12/dfs/dn/current... 2018-02-22 15:27:30,326 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException 2018-02-22 15:28:00,164 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException < and now the startup process has continue > 2018-02-22 15:28:22,019 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1891900421-xx.xxx.xx.xx-1410884922197 on volume /data/disk6/dfs/dn/current: 74670ms [skip] 2018-02-22 15:28:22,910 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-1891900421-xx.xxx.xx.xx-1410884922197 on volume /data/disk5/dfs/dn, after more than 504 hour(s) ... 2018-02-22 15:28:23,056 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic Directory Tree Verification scan starting at 1519318745056ms with interval of 21600000ms ... 2018-02-22 15:28:23,063 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1891900421-xx.xxx.xx.xx-1410884922197 (Datanode Uuid 3cf7f365-ab70-4ece-8bcb-5a410ef4a6fd) service to <my_namenode_hostname>/<my_namenode_ip>:8020 beginning handshake with NN < the Datanode normal operation >
Created 10-17-2018 07:26 AM
Were you able to resolve this issue? What did you do to fix the problem?
Created 10-17-2018 02:13 PM
As I've said before
<property> <name>dfs.disk.balancer.enabled</name> - <value>true</value> + <value>false</value> </property>
It does not really fix the problem but It's a kind of workaround.
Created 03-12-2018 02:29 AM
I'm not sure if it applies to your problem but maybe this blog will help you: http://gbif.blogspot.com/2015/05/dont-fill-your-hdfs-disks-upgrading-to.html
Created on 03-17-2019 09:03 PM - edited 03-17-2019 09:07 PM
This enables the diskbalancer feature on a cluster. By default, disk balancer is disabled.
then why is your config
+ <value>false</value>