Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Where are my disk spaces?

Where are my disk spaces?

New Contributor

I have a cluster of 8 Data Nodes and total 42TB. Replication is set 2. After loaded ~10TB data, I found out that only ~7TB left. I then deleted 1TB data with "-skipTrash" option, but did not see extra disk space is freed. Following is my disk usage:

hdfs@msl-dpe-perf87:/home/harry.li/tpcds_5.db$ hdfs dfs -df -h
Filesystem                            Size    Used  Available  Use%
hdfs://msl-dpe-perf88.msl.lab:8020  42.4 T  32.7 T      7.3 T   77%
hdfs@msl-dpe-perf87:/home/harry.li/tpcds_5.db$ hdfs dfs -du -h /
9.0 T    /TPCDS
979.2 M  /app-logs
477.8 G  /apps
0        /ats
918.2 M  /hdp
0        /mapred
8.4 M    /mr-history
0        /spark-history
5.7 M    /spark2-history
2.4 K    /tmp
105.7 G  /user


Question:

1. The math here seems not add up. With ~10TB data (replication2), I should still have at least 20TB left. Why I have only 7TB left

2. Why deleting data did not free up the disk space?

1 REPLY 1
Highlighted

Re: Where are my disk spaces?

New Contributor

@Harry Li : Space used depends on multiple factors like the file size, block size and replication factor. What was to the average file size of data? It might be possible that your block size set is too big compared to files you are adding. Can you do hadoop listing on TPCDS directory?