Reply
Highlighted
New Contributor
Posts: 5
Registered: ‎12-14-2018

When the disk sector of the datanode occur the bad, why did the HDFS machenism not work?

Hi.

We have encountered issues on our cluster that seems to be caused by bad disks.

Some consequences that we have seen when this happens is when the users read the data saved in the datanode with bad disks, the hbase machenism couldn't boot this read to the same data in the other datanode. The HDFS machenism always saved 3 copies of the same data on different datanodes. When the 1 copy of the same data has the issue, why the HDFS machenism couldn't provide the other 2 copies of the same data to the users to read?

 

The below is the statement about bad disks.

We see warnings messages of bad disks such as the below from the influenced datanode host :

...

 sd 1:0:20:0: [sdb] Add. Sense: Unrecovered read error

 sd 1:0:20:0: [sdb] CDB: Read(10): 28 00 2f 80 08 08 00 00 08 00

 end_request: critical medium error, dev sdb, sector 696254847

EXT4-fs error (device sdb1): __ext4_get_inode_loc: unable to read inode block - inode=21758022, block=87031844

...

 

In the datanode logs we see warnings such as:

...WARN util.Shell (DU.java:run(126)) - Could not get disk usage information

ExitCodeException exitCode=1: du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir162': Input/output error

...at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)

at org.apache.hadoop.util.Shell.run(Shell.java:455)

at org.apache.hadoop.fs.DU.run(DU.java:190)

at org.apache.hadoop.fs.DU$DURefreshThread.run(DU.java:119)

at java.lang.Thread.run(Thread.java:745)

....ERROR datanode.DataNode (DataXceiver.java:run(253)) - datavault-prod-data8.internal.machines:1019:DataXceiver error processing READ_BLOCK operation src: /x.x.x.x:55220 dst: /x.x.x7.x:1019

org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1356445971-x.x.x.x-1430142563027:blk_1367398616_293808003

at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:431)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:229)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493)

at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)

at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)

at java.lang.Thread.run(Thread.java:745)

 

Best Regards

/Richardw

 

Announcements