Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

space used in hdfs is different from free space

avatar

Hi, guys.


I don't know why disk usage is different when I'm running du & df.

'hdfs dfs -du -h -s /' gives around 2TB in total

and 'hadoop fs -df' does

Filesystem Size Used Available Use%

hdfs://ip:8020 15.8 T 12.6 T 2.3 T 80%

and 'sudo -u hdfs hdfs fsck /' gives me this

Total size: 2158294971710 B (Total open files size: 341851 B)

Total dirs: 627169 Total files: 59276

Total symlinks: 0 (Files currently being written: 13)

Total blocks (validated): 23879 (avg. block size 90384646 B) (Total open file blocks (not validated): 13)

Minimally replicated blocks: 23879 (100.0 %)

Over-replicated blocks: 0 (0.0 %)

Under-replicated blocks: 8 (0.03350224 %)

Mis-replicated blocks: 0 (0.0 %)

Default replication factor: 2

Average block replication: 2.0228233

Corrupt blocks: 0

Missing replicas: 32 (0.066204615 %)

Number of data-nodes: 6

Number of racks: 1


Let me know why and how I can get my mysteriously used space.

Thanks.

2 REPLIES 2

avatar
Super Guru

@Punit kumar

Let's assume your HDFS had available a single HDFS block size of 128 MB. If you write 1MB file to that block, that block is not available for another write, as such while your used space is less than 1% practically , but your space available is 0 block and 0 bytes for new write. I hope you get the difference. That happens when you deal with large blocks. Smaller than block size files can lead to waste of space. Instead of using df to show bytes available or used, you should look for blocks used and available and eventually multiply that with block size.

I responded to a similar question last year. Let me find it.

avatar
Super Guru