how is "DFS Remaining" in "hdfs dfsadmin -report" computed?
Name: ***.***.***.***:*** (***) Hostname: *** Decommission Status : Normal Configured Capacity: 83476791296 (77.74 GB) DFS Used: 1003606016 (957.11 MB) Non DFS Used: 11966496768 (11.14 GB) DFS Remaining: 70506688512 (65.66 GB) DFS Used%: 1.20% DFS Remaining%: 84.46% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 2 Last contact: ***
The filesystems that are specified for the datanode directories do not necessarily only contain HDFS data. For example, you may have /data01 as the mount point for your datanode with some other files in /data01/temp or something like that. The file sin /data01/datanode will be the portion that is "DFS Used", the portion in other directories on /data01 will be "Non DFS Used". The "DFS Remaining" will be the balance:
DFS Remaining = FS Size - DFS Used - Non DFS Used
@emaxwell In the above formula, FS Size, DFS Used and Non-DFS Used are known based on physical disk usage. However, I notice that when datanode is restarted, I see that the "Non DFS Used" goes down and 'DFS Remaining" goes up. how often, FS Size, DFS Used, Non DFS Used recorded?
When you run "dfsadmin -report", it gathers the information. There may be temp directories on the disk where jobs are storing data, or there could be temp files within HDFS that are getting removed on a restart. The amount of space is fluid and collected when you ask for the report.
There is constant heartbeat, block reports, and other information exchange between the namenode and the datanodes to keep track of where blocks are located, available space, under replicated blocks, etc. When you run a "dfsadmin -report", it uses the current information that the namenode has. This information is updated regularly. If you restart HDFS, each datanode takes an inventory and reports back to the namenode. If temporary files have been removed on restart, this will be reflected in the block reports back to the namenode.