Created on 12-11-2017 12:21 PM - edited 09-16-2022 05:37 AM
Hi All,
I have Cloudera 5.9 running on 3 Nodes. However recently I am noticing I get HDFS disk space alert very frequently.
My total cluster size is 1.3 TB.
I noticed that size of my dfs/dn/current/<blockpool>/current/finalized directory is too high. I am aware the finalized contains blocks that are not being written to by a client and have been completed. However whenever I move some of the subdir to anyother mount, it is replaced very fast in a couple of days with again many files(subdir).
I have these questions:
Size of dfs/dn/current/<blockpool>/current/finalized on 3 Nodes:
[hdfs@MasterNode1 current]$ du -sh finalized/ 639G finalized/ [root@DataNode1 current]# du -sh finalized/ 435G finalized/ [root@DataNode2 current]# du -sh finalized 426G finalized
Just for Nov 29 and 30 you can see so many Subdirs created, almost of a size of 800 MB to 3 GB
drwxr-xr-x 20 hdfs hdfs 4096 Nov 29 10:07 subdir41 drwxr-xr-x 13 hdfs hdfs 4096 Nov 29 10:09 subdir42 drwxr-xr-x 31 hdfs hdfs 4096 Nov 29 10:12 subdir43 drwxr-xr-x 24 hdfs hdfs 4096 Nov 29 10:17 subdir44 drwxr-xr-x 26 hdfs hdfs 4096 Nov 29 10:20 subdir45 drwxr-xr-x 17 hdfs hdfs 4096 Nov 29 10:24 subdir46 drwxr-xr-x 10 hdfs hdfs 4096 Nov 29 10:25 subdir47 drwxr-xr-x 29 hdfs hdfs 4096 Nov 29 10:32 subdir48 drwxr-xr-x 21 hdfs hdfs 4096 Nov 29 10:40 subdir51 drwxr-xr-x 12 hdfs hdfs 4096 Nov 29 10:40 subdir52 drwxr-xr-x 13 hdfs hdfs 4096 Nov 29 11:30 subdir53 drwxr-xr-x 27 hdfs hdfs 4096 Nov 29 11:30 subdir54 drwxr-xr-x 15 hdfs hdfs 4096 Nov 29 11:32 subdir55 drwxr-xr-x 117 hdfs hdfs 4096 Nov 29 13:48 subdir69 drwxr-xr-x 119 hdfs hdfs 4096 Nov 29 14:36 subdir71 drwxr-xr-x 136 hdfs hdfs 4096 Nov 29 15:18 subdir79 drwxr-xr-x 258 hdfs hdfs 12288 Nov 29 15:46 subdir193 drwxr-xr-x 89 hdfs hdfs 4096 Nov 29 16:06 subdir33 drwxr-xr-x 129 hdfs hdfs 4096 Nov 30 05:34 subdir72 drwxr-xr-x 122 hdfs hdfs 4096 Nov 30 06:21 subdir75 drwxr-xr-x 124 hdfs hdfs 4096 Nov 30 07:55 subdir77 drwxr-xr-x 95 hdfs hdfs 4096 Nov 30 08:32 subdir78 drwxr-xr-x 126 hdfs hdfs 4096 Nov 30 11:32 subdir85 drwxr-xr-x 124 hdfs hdfs 4096 Nov 30 12:08 subdir86 drwxr-xr-x 112 hdfs hdfs 4096 Nov 30 13:25 subdir88 drwxr-xr-x 130 hdfs hdfs 4096 Nov 30 14:25 subdir90 drwxr-xr-x 112 hdfs hdfs 4096 Nov 30 15:00 subdir91 drwxr-xr-x 57 hdfs hdfs 4096 Nov 30 18:23 subdir26 drwxr-xr-x 173 hdfs hdfs 4096 Nov 30 19:01 subdir34 drwxr-xr-x 30 hdfs hdfs 4096 Nov 30 19:03 subdir49 drwxr-xr-x 11 hdfs hdfs 4096 Nov 30 19:03 subdir50 drwxr-xr-x 27 hdfs hdfs 4096 Nov 30 19:06 subdir56 drwxr-xr-x 79 hdfs hdfs 4096 Nov 30 19:08 subdir57 drwxr-xr-x 141 hdfs hdfs 4096 Nov 30 19:49 subdir61 drwxr-xr-x 109 hdfs hdfs 4096 Nov 30 21:53 subdir64 drwxr-xr-x 126 hdfs hdfs 4096 Nov 30 22:08 subdir65 drwxr-xr-x 136 hdfs hdfs 4096 Nov 30 23:08 subdir68
Please advice.
Thanks,
Shilpa
Created 12-11-2017 05:19 PM
Created 12-11-2017 05:26 PM
Created on 12-14-2017 11:15 AM - edited 12-14-2017 11:22 AM
Hi @Harsh J,
Yesterday, I cleaned 50GB worth files from HDFS using fs -rm. The daily incoming size on HDFS is almost 13-15GB(including replication) however today again the size of dfs has increased almost 30-55 GB more. I dont understand why?
Only, on one Datanode the dfs files generated almost 15GB.
[root@DataNode1 finalized]# ls -lrt| grep "Dec 13"
drwxr-xr-x 208 hdfs hdfs 4096 Dec 13 05:59 subdir130
drwxr-xr-x 195 hdfs hdfs 4096 Dec 13 06:52 subdir132
drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 07:24 subdir134
drwxr-xr-x 188 hdfs hdfs 4096 Dec 13 07:32 subdir135
drwxr-xr-x 187 hdfs hdfs 4096 Dec 13 08:30 subdir138
drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 09:09 subdir139
drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 09:46 subdir173
drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 10:07 subdir174
drwxr-xr-x 258 hdfs hdfs 12288 Dec 13 15:30 subdir211