Reply
Expert Contributor
Posts: 63
Registered: ‎11-17-2016

how to clean dfs/dn/current/<blockpool>/current/finalized | too many subdirs generated

Hi All,

 

I have Cloudera 5.9 running on 3 Nodes. However recently I am noticing I get HDFS disk space alert very frequently.

 

My total cluster size is 1.3 TB. 

 

I noticed that size of my dfs/dn/current/<blockpool>/current/finalized  directory is too high. I am aware the finalized contains blocks that are not being written to by a client and have been completed. However whenever I move some of the subdir to anyother mount, it is replaced very fast in a couple of days with again many files(subdir).

 

I have these questions:

  1. Can I delete old subdirs becuase they only contain the info of the files that were written and completed.
  2. The auto-generation of so many files in a day means, that the connectivity of that particular node to Namenode is frquently going up and down. Hence creating so many subdirs?

Size of dfs/dn/current/<blockpool>/current/finalized  on 3 Nodes:

 

[hdfs@MasterNode1 current]$ du -sh finalized/
639G    finalized/

[root@DataNode1 current]# du -sh finalized/
435G    finalized/

[root@DataNode2 current]# du -sh finalized
426G    finalized

Just for Nov 29 and 30 you can see so many Subdirs created, almost of a size of 800 MB to 3 GB

drwxr-xr-x  20 hdfs hdfs  4096 Nov 29 10:07 subdir41
drwxr-xr-x  13 hdfs hdfs  4096 Nov 29 10:09 subdir42
drwxr-xr-x  31 hdfs hdfs  4096 Nov 29 10:12 subdir43
drwxr-xr-x  24 hdfs hdfs  4096 Nov 29 10:17 subdir44
drwxr-xr-x  26 hdfs hdfs  4096 Nov 29 10:20 subdir45
drwxr-xr-x  17 hdfs hdfs  4096 Nov 29 10:24 subdir46
drwxr-xr-x  10 hdfs hdfs  4096 Nov 29 10:25 subdir47
drwxr-xr-x  29 hdfs hdfs  4096 Nov 29 10:32 subdir48
drwxr-xr-x  21 hdfs hdfs  4096 Nov 29 10:40 subdir51
drwxr-xr-x  12 hdfs hdfs  4096 Nov 29 10:40 subdir52
drwxr-xr-x  13 hdfs hdfs  4096 Nov 29 11:30 subdir53
drwxr-xr-x  27 hdfs hdfs  4096 Nov 29 11:30 subdir54
drwxr-xr-x  15 hdfs hdfs  4096 Nov 29 11:32 subdir55
drwxr-xr-x 117 hdfs hdfs  4096 Nov 29 13:48 subdir69
drwxr-xr-x 119 hdfs hdfs  4096 Nov 29 14:36 subdir71
drwxr-xr-x 136 hdfs hdfs  4096 Nov 29 15:18 subdir79
drwxr-xr-x 258 hdfs hdfs 12288 Nov 29 15:46 subdir193
drwxr-xr-x  89 hdfs hdfs  4096 Nov 29 16:06 subdir33
drwxr-xr-x 129 hdfs hdfs  4096 Nov 30 05:34 subdir72
drwxr-xr-x 122 hdfs hdfs  4096 Nov 30 06:21 subdir75
drwxr-xr-x 124 hdfs hdfs  4096 Nov 30 07:55 subdir77
drwxr-xr-x  95 hdfs hdfs  4096 Nov 30 08:32 subdir78
drwxr-xr-x 126 hdfs hdfs  4096 Nov 30 11:32 subdir85
drwxr-xr-x 124 hdfs hdfs  4096 Nov 30 12:08 subdir86
drwxr-xr-x 112 hdfs hdfs  4096 Nov 30 13:25 subdir88
drwxr-xr-x 130 hdfs hdfs  4096 Nov 30 14:25 subdir90
drwxr-xr-x 112 hdfs hdfs  4096 Nov 30 15:00 subdir91
drwxr-xr-x  57 hdfs hdfs  4096 Nov 30 18:23 subdir26
drwxr-xr-x 173 hdfs hdfs  4096 Nov 30 19:01 subdir34
drwxr-xr-x  30 hdfs hdfs  4096 Nov 30 19:03 subdir49
drwxr-xr-x  11 hdfs hdfs  4096 Nov 30 19:03 subdir50
drwxr-xr-x  27 hdfs hdfs  4096 Nov 30 19:06 subdir56
drwxr-xr-x  79 hdfs hdfs  4096 Nov 30 19:08 subdir57
drwxr-xr-x 141 hdfs hdfs  4096 Nov 30 19:49 subdir61
drwxr-xr-x 109 hdfs hdfs  4096 Nov 30 21:53 subdir64
drwxr-xr-x 126 hdfs hdfs  4096 Nov 30 22:08 subdir65
drwxr-xr-x 136 hdfs hdfs  4096 Nov 30 23:08 subdir68

 

Please advice.

 

Thanks,

Shilpa

Posts: 1,884
Kudos: 422
Solutions: 297
Registered: ‎07-31-2013

Re: how to clean dfs/dn/current/<blockpool>/current/finalized | too many subdirs generated

The subdirs carry actual block data - deleting these would be fatal for
your actual HDFS data. If you have a space problem, clear out files on HDFS
by issuing regular deletes (fs -rm, etc.), not by messing around with the
internal storage format on independent DataNodes. Be sure to also check if
you have stale HDFS snapshots retaining older files.

The reason DNs use a subdirectory structure is mostly to avoid hitting its
underlying filesystem's (ext4, xfs, etc.) limits, and to make certain
scanning operations (such as for block reports) more efficient.
Highlighted
Expert Contributor
Posts: 63
Registered: ‎11-17-2016

Re: how to clean dfs/dn/current/<blockpool>/current/finalized | too many subdirs generated

Thanks for the explanation. Can you suggest a way to compress HDFS directories by using any libraries?
Expert Contributor
Posts: 63
Registered: ‎11-17-2016

Re: how to clean dfs/dn/current/<blockpool>/current/finalized | too many subdirs generated

[ Edited ]

Hi @Harsh J,

 

Yesterday, I cleaned 50GB worth files from HDFS using fs -rm. The daily incoming size on HDFS is almost 13-15GB(including replication) however today again the size of dfs has increased almost 30-55 GB more. I dont understand why?

 

Only, on one Datanode the dfs files generated almost 15GB.

 

[root@DataNode1 finalized]# ls -lrt| grep "Dec 13"
drwxr-xr-x 208 hdfs hdfs 4096 Dec 13 05:59 subdir130
drwxr-xr-x 195 hdfs hdfs 4096 Dec 13 06:52 subdir132
drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 07:24 subdir134
drwxr-xr-x 188 hdfs hdfs 4096 Dec 13 07:32 subdir135
drwxr-xr-x 187 hdfs hdfs 4096 Dec 13 08:30 subdir138
drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 09:09 subdir139
drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 09:46 subdir173
drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 10:07 subdir174
drwxr-xr-x 258 hdfs hdfs 12288 Dec 13 15:30 subdir211

Announcements