- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDFS storage check shows different values
- Labels:
-
HDFS
Created ‎02-19-2017 05:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hdfs dfs -du -h -s /
221.1 T 637.9 T /
====================
hdfs dfs -du -h -s .
204.2 M 1.2 G .
=================
But in the UI i see it's 670 T
I'm sure i'm missing something but cann't find it.
Configured Capacity: DFS Used: Non DFS Used: DFS Remaining: DFS Used%: DFS Remaining%: Block Pool Used:
1.02 PB |
670.54 TB |
283.37 GB |
368.96 TB |
64.49% |
35.48% |
670.54 TB |
Created on ‎02-20-2017 09:06 AM - edited ‎02-20-2017 09:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you run the below commands and post the results
I am curious , whats your replication factor ?
hadoop fsck path to directory hadoop fs -du -s path to directory
The above commands should give us the same results.
Both only caclulates hdfs raw data without considering the replication factor.
The below will calculate the file size across the nodes( hard disk ) and replication factor .
hadoop fs -count -q /path/to/directory
we can compare the results pertain to how much HDFS space has been consumed and run against Namenode UI results .
Created ‎02-20-2017 12:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The UI and even CM do a different calc and it is annoying as it isn't what I would call accurate. In the last few days I saw a JIRA related to it on how Non-DFS and the Reserved space are using in the calculation.
I don't have the current calc in front of me but it is different. It is obvious when you tally up the space space used (including non-dfs), and unused, and even the percentage. It will never equal 100%. And it will never equate to your raw disk availability.
I may get this wrong but it is related to amount of you have reserved for non-dfs data. That lops of the configured capacity but then the system also uses it to calculate the non-dfs used in a weird way that always says that there is more used than there ever is.
Created on ‎02-20-2017 06:37 PM - edited ‎02-20-2017 06:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Total size: 253714473531851 B (Total open files size: 11409372739 B)
Total dirs: 1028908
Total files: 7639121
Total symlinks: 0 (Files currently being written: 107)
Total blocks (validated): 8781147 (avg. block size 28893090 B) (Total open file blocks (not validated): 149)
Minimally replicated blocks: 8781147 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.8528664
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 30
Number of racks: 1
FSCK ended at Mon Feb 20 21:33:23 EST 2017 in 190136 milliseconds
The filesystem under path '/' is HEALTHY
hadoop fs -du -s /
244412682417174 708603392967605 /
hadoop fs -count -q /
9223372036854775807 9223372036846392726 none inf 987886 7395195 244417466380498 /
the non HDFS reserved space is 10 GB and for 30 nodes so it's should not exceed 1 T with replication factor 3.
it's really annoying.
Created ‎02-22-2017 03:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So i shouldn't search for the missing 40 T and the right storage is what the fsck shows?
Created ‎08-17-2017 01:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is it becoming so annoying since the difference now between the UI ( or hdfs dfsadmin -report) and hdfs dfs -du -h -s now 150T, i delete all the hdfs snapshots and disallow them but still get the same results.
Created ‎08-17-2017 10:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I figured out the issue.
The diffetence comes from /tmp/logs.
Weird why hdfs dfs -du -h -s / is not considering /tmp/logs.
