Created on 03-29-2017 04:12 AM - edited 09-16-2022 04:21 AM
Hi everyone,
we are getting some problems lately writing avro files on HDFS.
Just to have an idea we have this Storm cluslter that is writing Avro files directly on hdfs and sometimes it stops because all the datanode claims to be out of disk space
2017-03-29 00:00:12,456 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: cdh5-impala-worker-01.c.feisty-gasket-100715.internal:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.240.0.48:51432 dst: /10.240.0.58:50010 org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=33840904 B) is less than the block size (=134217728 B). at org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:95) at org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:67) at org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy.doChooseVolume(AvailableSpaceVolumeChoosingPolicy.java:140) at org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy.chooseVolume(AvailableSpaceVolumeChoosingPolicy.java:128) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.chooseVolume(FsVolumeList.java:80) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:107) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1316) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:199) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:667) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:745)
now the interesting part is that the datanode still have space
/dev/sdd 1.5T 811G 619G 57% /dfs/dn/archive-03 /dev/sde 1.5T 804G 626G 57% /dfs/dn/archive-04 /dev/sdb 1.5T 1.1T 384G 74% /dfs/dn/archive-01 /dev/sdc 1.5T 802G 628G 57% /dfs/dn/archive-02 /dev/sdg 1.5T 802G 627G 57% /dfs/dn/archive-06 /dev/sdf 1.5T 804G 625G 57% /dfs/dn/archive-05
and if we restart the datanode the problem seems to be solved.
The other strange part is that now the cluster is configured to use
AvailableSpaceVolumeChoosingPolicy
but on the exception it is explicit that the it is still using :
RoundRobinVolumeChoosingPolicy
Best
Saverio Mucci
instal.com
Created 03-29-2017 04:26 AM
Created 03-29-2017 05:03 AM
Hi Harsh,
yes i thought the same for the RoundRobinVolumeChoosingPolicy.
Storm is currently opening few files i would expect at least 9 files open contemporary but it should not be the reason because as you can see there are at least 300GB per disk available.
At the moment the cluster is deployed at version CDH 5.7.1 and the distribution of the data from the Namenode interface is:
DataNodes usages% (Min/Median/Max/stdDev): 40.62% / 56.37% / 70.12% / 9.97%
The only doubt i had is that this files are supposed to be on SSD storage but from the Namenode interface i get
Storage Type Configured Capacity Capacity Used Capacity Remaining Block Pool Used Nodes In Service DISK 33.41 TB 20.34 TB (60.88%) 6.53 TB (19.55%) 20.34 TB 11 SSD 30.81 TB 16.38 TB (53.14%) 13 TB (42.2%) 16.38 TB 6
Created 03-29-2017 06:25 AM
Created 04-06-2017 07:40 AM