Created on 10-10-2019 02:01 AM - edited 09-16-2022 07:33 AM
Hi Cloudera,
I have a similar issue with this https://community.cloudera.com/t5/Support-Questions/Datanode-is-not-connecting-to-namenode-CDH-5-14-..., but in my case the solution by disabling the disk balancer did not helped.
STARTUP_MSG: build = <a href="<a href="http://github.com/cloudera/hadoop" target="_blank">http://github.com/cloudera/hadoop</a>" target="_blank"><a href="http://github.com/cloudera/hadoop</a" target="_blank">http://github.com/cloudera/hadoop</a</a>> -r 2d822203265a2827554b84cbb46c69b86ccca149; compiled by 'jenkins' on 2018-08-09T16:22Z
STARTUP_MSG: java = 1.8.0_161
************************************************************/
2019-10-10 10:32:12,421 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2019-10-10 10:32:12,880 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hdfs/ip-10-197-27-68.eu-west-1.compute.internal@REALM.LOCAL using keytab file hdfs.keytab
2019-10-10 10:32:13,074 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2019-10-10 10:32:13,114 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2019-10-10 10:32:13,114 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2019-10-10 10:32:13,119 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576
2019-10-10 10:32:13,120 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is enabled.
2019-10-10 10:32:13,121 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:13,151 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 8589934592
2019-10-10 10:32:13,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /10.197.27.68:50010
2019-10-10 10:32:13,172 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 10485760 bytes/s
2019-10-10 10:32:13,172 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 10485760 bytes/s
2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Listening on UNIX domain socket: /var/run/hdfs-sockets/dn
2019-10-10 10:32:13,219 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2019-10-10 10:32:13,224 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-10 10:32:13,228 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined
2019-10-10 10:32:13,235 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-10 10:32:13,236 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2019-10-10 10:32:13,237 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2019-10-10 10:32:13,237 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2019-10-10 10:32:13,248 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 44480
2019-10-10 10:32:13,248 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2019-10-10 10:32:13,435 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:44480
2019-10-10 10:32:13,781 INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTPS traffic on /10.197.27.68:50475
2019-10-10 10:32:13,786 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2019-10-10 10:32:13,786 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hdfs/ip-10-197-27-68.eu-west-1.compute.internal@REALM.LOCAL
2019-10-10 10:32:13,786 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = hdfs
2019-10-10 10:32:13,811 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 300
2019-10-10 10:32:13,822 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2019-10-10 10:32:13,936 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /10.197.27.68:50020
2019-10-10 10:32:13,966 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: hanameservice
2019-10-10 10:32:13,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: hanameservice
2019-10-10 10:32:13,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022 starting to offer service
2019-10-10 10:32:13,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022 starting to offer service
2019-10-10 10:32:13,992 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2019-10-10 10:32:13,992 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2019-10-10 10:32:15,042 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:15,042 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:16,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:16,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:17,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:17,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:18,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:18,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:19,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:19,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:20,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:20,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:21,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:21,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:22,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:22,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:23,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:23,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:24,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:24,049 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:24,050 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022
2019-10-10 10:32:24,050 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022
2019-10-10 10:32:29,154 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 6 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=6, dataDirs=6)
2019-10-10 10:32:29,169 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data1/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,192 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data2/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,195 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data3/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,221 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data8/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,240 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data9/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,255 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data11/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,276 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,276 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data1/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,295 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,295 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data2/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,313 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,314 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data3/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,331 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,331 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data8/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,347 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,348 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data9/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,363 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,363 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data11/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,364 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=1710566395;bpid=BP-76826636-10.197.31.86-1501521881839;lv=-56;nsInfo=lv=-60;cid=cluster2;nsid=1710566395;c=0;bpid=BP-76826636-10.197.31.86-1501521881839;dnuuid=2de9411f-0f62-431e-bdfb-c1bbc7c20655
2019-10-10 10:32:29,384 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy: Available space volume choosing policy initialized: dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold = 10737418240, dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction = 0.75
2019-10-10 10:32:29,392 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-37471644-43b9-4631-be36-b72215d9c152
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data1/cdh/current, StorageType: DISK
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-1e368637-4201-4558-99c1-25d7ab6bb6d4
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data2/cdh/current, StorageType: DISK
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-6d2daa74-6042-4e3e-a91f-1c91393777f4
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data3/cdh/current, StorageType: DISK
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-3605e8a7-240c-4f46-bd94-fb9a76240925
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data8/cdh/current, StorageType: DISK
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-190d4a84-4811-4186-9fda-a6cfe07008ec
2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data9/cdh/current, StorageType: DISK
2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-709f30d4-d700-48f9-972d-6def31844ab7
2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data11/cdh/current, StorageType: DISK
2019-10-10 10:32:29,398 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Volume reference is released.
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh/current...
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current...
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current...
2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current...
2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current...
2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current...
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data1/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 863853522944
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data2/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 865228335039
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data8/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 865263112192
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data3/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 860011779047
2019-10-10 10:32:29,408 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data9/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 735340314624
2019-10-10 10:32:29,408 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data11/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 1472138194944
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data8/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data3/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data2/cdh/current: 12ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data11/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data9/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data1/cdh/current: 12ms
2019-10-10 10:32:29,412 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to scan all replicas for block pool BP-76826636-10.197.31.86-1501521881839: 13ms
2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh/current...
2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current...
2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current...
2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current...
2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current...
2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current...
2019-10-10 10:32:30,051 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:31,936 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current: 2521ms
2019-10-10 10:32:32,156 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current: 2741ms
2019-10-10 10:32:33,780 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current: 4365ms
2019-10-10 10:32:33,824 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
I verfied the override for this particular data node:
</property>
<!--'dfs.disk.balancer.enabled', originally set to 'true' (non-final), is overridden below by a safety valve-->
<property>
<name>dfs.disk.balancer.enabled</name>
<value>false</value>
</property>
@pifta mentioned that there should be some error in the startup about race condition, but I dont see any kind of error except the NPE there. Also there was a suggestion that filled disks can cause this issue, but every data volume has at least 80GB available.
Thanks
Created 10-10-2019 02:10 AM
And after 2 minutes (it was writing the same NPE error) suddenly "fixes" itself, and the DN starts..
2019-10-10 11:04:04,846 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342)
at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:320)
at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1301)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
... 31 more
2019-10-10 11:04:05,054 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current: 56073ms
2019-10-10 11:04:42,106 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current: 93125ms
2019-10-10 11:04:42,106 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to add all replicas to map: 93126ms
2019-10-10 11:04:42,170 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data9/cdh, DS-190d4a84-4811-4186-9fda-a6cfe07008ec): no suitable block pools found to scan. Waiting 551660352 ms.
2019-10-10 11:04:42,184 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data2/cdh, DS-1e368637-4201-4558-99c1-25d7ab6bb6d4): no suitable block pools found to scan. Waiting 551660354 ms.
2019-10-10 11:04:42,200 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh, after more than 504 hour(s)
2019-10-10 11:04:42,205 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh, after more than 504 hour(s)
2019-10-10 11:04:42,227 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data3/cdh, DS-6d2daa74-6042-4e3e-a91f-1c91393777f4): no suitable block pools found to scan. Waiting 551660336 ms.
2019-10-10 11:04:42,276 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh, after more than 504 hour(s)
So I am not sure now, is this because of the disable - or something else?
Created 10-10-2019 02:28 AM
Removed the balancer override (so it is now true) and the DN is still ok. So I dont know what is the reason, but it is definitely not solved. I think under some conditions this can happen to anybody running on CDH5