I have a cluster of 8 Data Nodes and total 42TB. Replication is set 2. After loaded ~10TB data, I found out that only ~7TB left. I then deleted 1TB data with "-skipTrash" option, but did not see extra disk space is freed. Following is my disk usage:
hdfs@msl-dpe-perf87:/home/harry.li/tpcds_5.db$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://msl-dpe-perf88.msl.lab:8020 42.4 T 32.7 T 7.3 T 77%
hdfs@msl-dpe-perf87:/home/harry.li/tpcds_5.db$ hdfs dfs -du -h /
9.0 T /TPCDS
979.2 M /app-logs
477.8 G /apps
0 /ats
918.2 M /hdp
0 /mapred
8.4 M /mr-history
0 /spark-history
5.7 M /spark2-history
2.4 K /tmp
105.7 G /user
Question:
1. The math here seems not add up. With ~10TB data (replication2), I should still have at least 20TB left. Why I have only 7TB left
2. Why deleting data did not free up the disk space?