Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DataNodes green in CM, Dead for NameNode - only in CDH 6.1.1

DataNodes green in CM, Dead for NameNode - only in CDH 6.1.1

New Contributor

Hi Guys!

 

My Datanodes are dead and alive at the same time. 

For Cloudera Manager they are alive and green. But my active NameNode does not see them. The NN goes to Safe Mode because "The number of live datanodes 6 has reached the minimum number 1". 

I tried with and without Kerberos - no difference. (Enabling HA not possible, when NN is in Safe Mode)

 

This Problem only occurs during the Rollout of CDH 6.1.1. With 6.1.0 I do not face any issue. Do you know any bug in CDH6.1.1 ?

 

The Namenode Log does not give me a hint:

 

2019-03-08 09:42:41,297 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/22.60.88.24:9866
2019-03-08 09:42:41,298 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager: Registered DN 982fa39e-d709-403a-52878251e4c7 (22.60.88.24:9866).
2019-03-08 09:42:41,330 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Adding new storage ID DS-9c1e2c9d-9dd8-47ed-b0f2-44a74062cccd for DN 22.60.88.24:9866
2019-03-08 09:42:41,330 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor: Adding new storage ID DS-0ced16b7-061d-4485-9c04-118164e74e0b for DN 22.60.88.24:9866
2019-03-08 09:42:41,347 INFO BlockStateChange: BLOCK* processReport 0xdea010725ba0b897: Processing first storage report for DS-9c1e2c9d-9dd8-47ed-b0f2-44a74062cccd from datanode 982fa39e-d709-403a-52878251e4c7
2019-03-08 09:42:41,347 INFO BlockStateChange: BLOCK* processReport 0xdea010725ba0b897: from storage DS-9c1e2c9d-9dd8-47ed-b0f2-44a74062cccd node DatanodeRegistration(22.60.88.24:9866, datanodeUuid=982fa39e-d709-403a-52878251e4c7, infoPort=0, infoSecurePort=9865, ipcPort=9867, storageInfo=lv=-57;cid=cluster3;nsid=740970845;c=1551098500107), blocks: 0, hasStaleStorage: true, processing time: 0 msecs, invalidatedBlocks: 0
2019-03-08 09:42:41,348 INFO BlockStateChange: BLOCK* processReport 0xdea010725ba0b897: Processing first storage report for DS-0ced16b7-061d-4485-9c04-118164e74e0b from datanode 982fa39e-d709-403a-52878251e4c7
2019-03-08 09:42:41,348 INFO BlockStateChange: BLOCK* processReport 0xdea010725ba0b897: from storage DS-0ced16b7-061d-4485-9c04-118164e74e0b node DatanodeRegistration(22.60.88.24:9866, datanodeUuid=982fa39e-d709-403a-52878251e4c7, infoPort=0, infoSecurePort=9865, ipcPort=9867, storageInfo=lv=-57;cid=cluster3;nsid=740970845;c=1551098500007), blocks: 0, hasStaleStorage: false, processing time: 0 msecs, invalidatedBlocks: 0
2019-03-08 09:42:50,978 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call Call#5434 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 22.60.92.123:49716: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2019_03_08-09_42_50. Name node is in safe mode.
The reported blocks 1 needs additional 1393 blocks to reach the threshold 0.9990 of total blocks 1396.
The number of live datanodes 6 has reached the minimum number 1. Safe mode will be turned off automatically once the thresholds have been reached. 
2019-03-08 09:43:30,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 16 on 8022, call Call#1 Retry#0 org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 22.60.92.118:49452
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Log not rolled. Name node is in safe mode.
The reported blocks 1 needs additional 1393 blocks to reach the threshold 0.9990 of total blocks 1396.
The number of live datanodes 6 has reached the minimum number 1. Safe mode will be turned off automatically once the thresholds have been reached. 

 


Datanode is Dead and Alive.png
1 REPLY 1

Re: DataNodes green in CM, Dead for NameNode - only in CDH 6.1.1

Master Guru
The concerning bit is this part of the message:

> The reported blocks 1 needs additional 1393 blocks to reach the threshold 0.9990 of total blocks 1396.

This indicates that while your DataNodes have come up and began reporting in they are not finding any of their locally stored block files to send in as part of the reports. The NameNode is waiting for enough (99.9%) data to be available to users before it opens itself for full access, but its stuck in a never-ending loop because no DNs are reporting availability of those blocks.

The overall number of blocks seem low, is this a test/demo setup? If yes, was the block data from the DNs ever wiped out or removed away as part of the upgrade/install attempts? Or perhaps were all DNs replaced in the test with new ones at any point?

If the data is not of concern at this stage (and ONLY if so) can force your NameNode out of safemode manually via 'hdfs dfsadmin -safemode leave' command (as 'hdfs' user or any granted HDFS superuser).

If you'd like to perform some more investigation on the blocks disappearance, checkout the DataNode logs where these blocks have resided in past.