Reply
Highlighted
Expert Contributor
Posts: 63
Registered: ‎11-17-2016

how to clean dfs/dn/current/<blockpool>/current/finalized | too many subdirs generated

Hi All,

 

I have Cloudera 5.9 running on 3 Nodes. However recently I am noticing I get HDFS disk space alert very frequently.

 

My total cluster size is 1.3 TB. 

 

I noticed that size of my dfs/dn/current/<blockpool>/current/finalized  directory is too high. I am aware the finalized contains blocks that are not being written to by a client and have been completed. However whenever I move some of the subdir to anyother mount, it is replaced very fast in a couple of days with again many files(subdir).

 

I have these questions:

  1. Can I delete old subdirs becuase they only contain the info of the files that were written and completed.
  2. The auto-generation of so many files in a day means, that the connectivity of that particular node to Namenode is frquently going up and down. Hence creating so many subdirs?

Size of dfs/dn/current/<blockpool>/current/finalized  on 3 Nodes:

 

[hdfs@MasterNode1 current]$ du -sh finalized/
639G    finalized/

[root@DataNode1 current]# du -sh finalized/
435G    finalized/

[root@DataNode2 current]# du -sh finalized
426G    finalized

Just for Nov 29 and 30 you can see so many Subdirs created, almost of a size of 800 MB to 3 GB

drwxr-xr-x  20 hdfs hdfs  4096 Nov 29 10:07 subdir41
drwxr-xr-x  13 hdfs hdfs  4096 Nov 29 10:09 subdir42
drwxr-xr-x  31 hdfs hdfs  4096 Nov 29 10:12 subdir43
drwxr-xr-x  24 hdfs hdfs  4096 Nov 29 10:17 subdir44
drwxr-xr-x  26 hdfs hdfs  4096 Nov 29 10:20 subdir45
drwxr-xr-x  17 hdfs hdfs  4096 Nov 29 10:24 subdir46
drwxr-xr-x  10 hdfs hdfs  4096 Nov 29 10:25 subdir47
drwxr-xr-x  29 hdfs hdfs  4096 Nov 29 10:32 subdir48
drwxr-xr-x  21 hdfs hdfs  4096 Nov 29 10:40 subdir51
drwxr-xr-x  12 hdfs hdfs  4096 Nov 29 10:40 subdir52
drwxr-xr-x  13 hdfs hdfs  4096 Nov 29 11:30 subdir53
drwxr-xr-x  27 hdfs hdfs  4096 Nov 29 11:30 subdir54
drwxr-xr-x  15 hdfs hdfs  4096 Nov 29 11:32 subdir55
drwxr-xr-x 117 hdfs hdfs  4096 Nov 29 13:48 subdir69
drwxr-xr-x 119 hdfs hdfs  4096 Nov 29 14:36 subdir71
drwxr-xr-x 136 hdfs hdfs  4096 Nov 29 15:18 subdir79
drwxr-xr-x 258 hdfs hdfs 12288 Nov 29 15:46 subdir193
drwxr-xr-x  89 hdfs hdfs  4096 Nov 29 16:06 subdir33
drwxr-xr-x 129 hdfs hdfs  4096 Nov 30 05:34 subdir72
drwxr-xr-x 122 hdfs hdfs  4096 Nov 30 06:21 subdir75
drwxr-xr-x 124 hdfs hdfs  4096 Nov 30 07:55 subdir77
drwxr-xr-x  95 hdfs hdfs  4096 Nov 30 08:32 subdir78
drwxr-xr-x 126 hdfs hdfs  4096 Nov 30 11:32 subdir85
drwxr-xr-x 124 hdfs hdfs  4096 Nov 30 12:08 subdir86
drwxr-xr-x 112 hdfs hdfs  4096 Nov 30 13:25 subdir88
drwxr-xr-x 130 hdfs hdfs  4096 Nov 30 14:25 subdir90
drwxr-xr-x 112 hdfs hdfs  4096 Nov 30 15:00 subdir91
drwxr-xr-x  57 hdfs hdfs  4096 Nov 30 18:23 subdir26
drwxr-xr-x 173 hdfs hdfs  4096 Nov 30 19:01 subdir34
drwxr-xr-x  30 hdfs hdfs  4096 Nov 30 19:03 subdir49
drwxr-xr-x  11 hdfs hdfs  4096 Nov 30 19:03 subdir50
drwxr-xr-x  27 hdfs hdfs  4096 Nov 30 19:06 subdir56
drwxr-xr-x  79 hdfs hdfs  4096 Nov 30 19:08 subdir57
drwxr-xr-x 141 hdfs hdfs  4096 Nov 30 19:49 subdir61
drwxr-xr-x 109 hdfs hdfs  4096 Nov 30 21:53 subdir64
drwxr-xr-x 126 hdfs hdfs  4096 Nov 30 22:08 subdir65
drwxr-xr-x 136 hdfs hdfs  4096 Nov 30 23:08 subdir68

 

Please advice.

 

Thanks,

Shilpa

Posts: 1,664
Kudos: 325
Solutions: 262
Registered: ‎07-31-2013

Re: how to clean dfs/dn/current/<blockpool>/current/finalized | too many subdirs generated

The subdirs carry actual block data - deleting these would be fatal for
your actual HDFS data. If you have a space problem, clear out files on HDFS
by issuing regular deletes (fs -rm, etc.), not by messing around with the
internal storage format on independent DataNodes. Be sure to also check if
you have stale HDFS snapshots retaining older files.

The reason DNs use a subdirectory structure is mostly to avoid hitting its
underlying filesystem's (ext4, xfs, etc.) limits, and to make certain
scanning operations (such as for block reports) more efficient.
Expert Contributor
Posts: 63
Registered: ‎11-17-2016

Re: how to clean dfs/dn/current/<blockpool>/current/finalized | too many subdirs generated

Thanks for the explanation. Can you suggest a way to compress HDFS directories by using any libraries?
Expert Contributor
Posts: 63
Registered: ‎11-17-2016

Re: how to clean dfs/dn/current/<blockpool>/current/finalized | too many subdirs generated

[ Edited ]

Hi @Harsh J,

 

Yesterday, I cleaned 50GB worth files from HDFS using fs -rm. The daily incoming size on HDFS is almost 13-15GB(including replication) however today again the size of dfs has increased almost 30-55 GB more. I dont understand why?

 

Only, on one Datanode the dfs files generated almost 15GB.

 

[root@DataNode1 finalized]# ls -lrt| grep "Dec 13"
drwxr-xr-x 208 hdfs hdfs 4096 Dec 13 05:59 subdir130
drwxr-xr-x 195 hdfs hdfs 4096 Dec 13 06:52 subdir132
drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 07:24 subdir134
drwxr-xr-x 188 hdfs hdfs 4096 Dec 13 07:32 subdir135
drwxr-xr-x 187 hdfs hdfs 4096 Dec 13 08:30 subdir138
drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 09:09 subdir139
drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 09:46 subdir173
drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 10:07 subdir174
drwxr-xr-x 258 hdfs hdfs 12288 Dec 13 15:30 subdir211

Announcements