Support Questions

Saverio · ‎03-29-2017

Hi everyone,

we are getting some problems lately writing avro files on HDFS.

Just to have an idea we have this Storm cluslter that is writing Avro files directly on hdfs and sometimes it stops because all the datanode claims to be out of disk space

2017-03-29 00:00:12,456 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: cdh5-impala-worker-01.c.feisty-gasket-100715.internal:50010:DataXceiver error processing WRITE_BLOCK operation  src: /10.240.0.48:51432 dst: /10.240.0.58:50010
org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=33840904 B) is less than the block size (=134217728 B).
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:95)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:67)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy.doChooseVolume(AvailableSpaceVolumeChoosingPolicy.java:140)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy.chooseVolume(AvailableSpaceVolumeChoosingPolicy.java:128)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.chooseVolume(FsVolumeList.java:80)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:107)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1316)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:199)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:667)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:745)

now the interesting part is that the datanode still have space

/dev/sdd        1.5T  811G  619G  57% /dfs/dn/archive-03
/dev/sde        1.5T  804G  626G  57% /dfs/dn/archive-04
/dev/sdb        1.5T  1.1T  384G  74% /dfs/dn/archive-01
/dev/sdc        1.5T  802G  628G  57% /dfs/dn/archive-02
/dev/sdg        1.5T  802G  627G  57% /dfs/dn/archive-06
/dev/sdf        1.5T  804G  625G  57% /dfs/dn/archive-05

and if we restart the datanode the problem seems to be solved.

The other strange part is that now the cluster is configured to use

AvailableSpaceVolumeChoosingPolicy

but on the exception it is explicit that the it is still using :

RoundRobinVolumeChoosingPolicy

Best

Saverio Mucci

instal.com

Harsh J · ‎03-29-2017

Does your program open a lot of files for write, in parallel, on HDFS? The
DN's "available space" is a more of a guarantee number, i.e. it discounts
entire block sizes for any open for write block that's undergoing writes
presently. This would reflect in your charts for DN's available space as
well, you should see it grow and fall in relation with # of files being
opened in parallel.

On the policy class note, the AvailableSpaceVolumeChoosingPolicy uses
RoundRobinVolumeChoosingPolicy internally once it has decided on a subset
of disks to choose from (against its configured thresholds). The policy
doesn't appear to be the cause here, however.

What version of CDH5 is this cluster running?

Saverio · ‎03-29-2017

Hi Harsh,
yes i thought the same for the RoundRobinVolumeChoosingPolicy.

Storm is currently opening few files i would expect at least 9 files open contemporary but it should not be the reason because as you can see there are at least 300GB per disk available.

At the moment the cluster is deployed at version CDH 5.7.1 and the distribution of the data from the Namenode interface is:

DataNodes usages% (Min/Median/Max/stdDev): 40.62% / 56.37% / 70.12% / 9.97%

The only doubt i had is that this files are supposed to be on SSD storage but from the Namenode interface i get

Storage Type    Configured Capacity Capacity Used       Capacity Remaining  Block Pool Used   Nodes In Service
DISK            33.41 TB            20.34 TB (60.88%)   6.53 TB (19.55%)    20.34 TB          11
SSD             30.81 TB            16.38 TB (53.14%)   13 TB (42.2%)       16.38 TB          6

Saverio · ‎03-29-2017

Is there a way I can ask hdfs how many file descriptors are open ?

weichiu · ‎04-06-2017

use lsof command, and you should be able to see all the open files

Cloudera Community

Support Questions

Datanode DiskOutOfSpaceException even if disk is present