Support Questions
Find answers, ask questions, and share your expertise

Where are my disk spaces?


I have a cluster of 8 Data Nodes and total 42TB. Replication is set 2. After loaded ~10TB data, I found out that only ~7TB left. I then deleted 1TB data with "-skipTrash" option, but did not see extra disk space is freed. Following is my disk usage:

hdfs@msl-dpe-perf87:/home/$ hdfs dfs -df -h
Filesystem                            Size    Used  Available  Use%
hdfs://msl-dpe-perf88.msl.lab:8020  42.4 T  32.7 T      7.3 T   77%
hdfs@msl-dpe-perf87:/home/$ hdfs dfs -du -h /
9.0 T    /TPCDS
979.2 M  /app-logs
477.8 G  /apps
0        /ats
918.2 M  /hdp
0        /mapred
8.4 M    /mr-history
0        /spark-history
5.7 M    /spark2-history
2.4 K    /tmp
105.7 G  /user


1. The math here seems not add up. With ~10TB data (replication2), I should still have at least 20TB left. Why I have only 7TB left

2. Why deleting data did not free up the disk space?


New Contributor

@Harry Li : Space used depends on multiple factors like the file size, block size and replication factor. What was to the average file size of data? It might be possible that your block size set is too big compared to files you are adding. Can you do hadoop listing on TPCDS directory?

; ;