I don't know why disk usage is different when I'm running du & df.
'hdfs dfs -du -h -s /' gives around 2TB in total
and 'hadoop fs -df' does
Filesystem Size Used Available Use%
hdfs://ip:8020 15.8 T 12.6 T 2.3 T 80%
and 'sudo -u hdfs hdfs fsck /' gives me this
Total size: 2158294971710 B (Total open files size: 341851 B)
Total dirs: 627169 Total files: 59276
Total symlinks: 0 (Files currently being written: 13)
Total blocks (validated): 23879 (avg. block size 90384646 B) (Total open file blocks (not validated): 13)
Minimally replicated blocks: 23879 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 8 (0.03350224 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0228233
Corrupt blocks: 0
Missing replicas: 32 (0.066204615 %)
Number of data-nodes: 6
Number of racks: 1
Let me know why and how I can get my mysteriously used space.
Let's assume your HDFS had available a single HDFS block size of 128 MB. If you write 1MB file to that block, that block is not available for another write, as such while your used space is less than 1% practically , but your space available is 0 block and 0 bytes for new write. I hope you get the difference. That happens when you deal with large blocks. Smaller than block size files can lead to waste of space. Instead of using df to show bytes available or used, you should look for blocks used and available and eventually multiply that with block size.
I responded to a similar question last year. Let me find it.