Created 02-19-2017 05:30 AM
hdfs dfs -du -h -s /
221.1 T 637.9 T /
====================
hdfs dfs -du -h -s .
204.2 M 1.2 G .
=================
But in the UI i see it's 670 T
I'm sure i'm missing something but cann't find it.
Configured Capacity: DFS Used: Non DFS Used: DFS Remaining: DFS Used%: DFS Remaining%: Block Pool Used:
1.02 PB |
670.54 TB |
283.37 GB |
368.96 TB |
64.49% |
35.48% |
670.54 TB |
Created on 02-20-2017 09:06 AM - edited 02-20-2017 09:22 AM
Could you run the below commands and post the results
I am curious , whats your replication factor ?
hadoop fsck path to directory hadoop fs -du -s path to directory
The above commands should give us the same results.
Both only caclulates hdfs raw data without considering the replication factor.
The below will calculate the file size across the nodes( hard disk ) and replication factor .
hadoop fs -count -q /path/to/directory
we can compare the results pertain to how much HDFS space has been consumed and run against Namenode UI results .
Created 02-20-2017 12:25 PM
Created on 02-20-2017 06:37 PM - edited 02-20-2017 06:42 PM
Total size: 253714473531851 B (Total open files size: 11409372739 B)
Total dirs: 1028908
Total files: 7639121
Total symlinks: 0 (Files currently being written: 107)
Total blocks (validated): 8781147 (avg. block size 28893090 B) (Total open file blocks (not validated): 149)
Minimally replicated blocks: 8781147 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.8528664
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 30
Number of racks: 1
FSCK ended at Mon Feb 20 21:33:23 EST 2017 in 190136 milliseconds
The filesystem under path '/' is HEALTHY
hadoop fs -du -s /
244412682417174 708603392967605 /
hadoop fs -count -q /
9223372036854775807 9223372036846392726 none inf 987886 7395195 244417466380498 /
the non HDFS reserved space is 10 GB and for 30 nodes so it's should not exceed 1 T with replication factor 3.
it's really annoying.
Created 02-22-2017 03:22 PM
So i shouldn't search for the missing 40 T and the right storage is what the fsck shows?
Created 08-17-2017 01:53 AM
is it becoming so annoying since the difference now between the UI ( or hdfs dfsadmin -report) and hdfs dfs -du -h -s now 150T, i delete all the hdfs snapshots and disallow them but still get the same results.
Created 08-17-2017 10:41 AM
I figured out the issue.
The diffetence comes from /tmp/logs.
Weird why hdfs dfs -du -h -s / is not considering /tmp/logs.