Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Cloudera Employee

hdfs dfsadmin -report outputs a brief report on the overall HDFS filesystem. It’s a useful command to quickly view how much disk is available, how many DataNodes are running, corrupted blocks etc.

Note: This article explains the disk space calculations as seen by the HDFS.

Command: Run the command with sudo -u hdfs prefixed to ensure you don't get a permission denied error.

sudo -u hdfs hdfs dfsadmin -report

You will see an output similar to:

Configured Capacity: 270082531328 (251.53 GB)
Present Capacity: 190246318080 (177.18 GB)
DFS Remaining: 143504465920 (133.65 GB)
DFS Used: 46741852160 (43.53 GB)
DFS Used%: 24.57%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
 
-------------------------------------------------
Live datanodes (4):
 
Name: 123.45.678.910:50010 (kharearpit4.local)
Hostname: kharearpit4.local
Rack: /rack4
Decommission Status : Normal
Configured Capacity: 20063055872 (18.69 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 5971144704 (5.56 GB)
DFS Remaining: 14091870208 (13.12 GB)
DFS Used%: 0.00%
DFS Remaining%: 70.24%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:56 UTC 2017
 
 
Name: 123.45.678.909:50010 (kharearpit3.local)
Hostname: kharearpit3.local
Rack: /rack3
Decommission Status : Normal
Configured Capacity: 83339825152 (77.62 GB)
DFS Used: 15580618752 (14.51 GB)
Non DFS Used: 22774845440 (21.21 GB)
DFS Remaining: 44984360960 (41.89 GB)
DFS Used%: 18.70%
DFS Remaining%: 53.98%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:58 UTC 2017
 
 
Name: 123.45.678.908:50010 (kharearpit1.local)
Hostname: kharearpit1.local
Rack: /rack1
Decommission Status : Normal
Configured Capacity: 83339825152 (77.62 GB)
DFS Used: 15580672000 (14.51 GB)
Non DFS Used: 31497687040 (29.33 GB)
DFS Remaining: 36261466112 (33.77 GB)
DFS Used%: 18.70%
DFS Remaining%: 43.51%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:58 UTC 2017
 
 
Name: 123.45.678.907:50010 (kharearpit2.local)
Hostname: kharearpit2.local
Rack: /rack2
Decommission Status : Normal
Configured Capacity: 83339825152 (77.62 GB)
DFS Used: 15580520448 (14.51 GB)
Non DFS Used: 19592536064 (18.25 GB)
DFS Remaining: 48166768640 (44.86 GB)
DFS Used%: 18.70%
DFS Remaining%: 57.80%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:58 UTC 2017

This article aims at explaining the concepts of Configured Capacity, Present Capacity, DFS Used, DFS Remaining, Non DFS Used, in HDFS. The diagram below clearly explains these output space parameters assuming HDFS as a single disk.

14883-untitled-diagram.jpg

A detailed explanation of these parameters are as follows:

1. Configured Capacity

It is the total capacity available to HDFS for storage. It is calculated as follows:

Configured Capacity = Total Disk Space - Reserved Space

Reserved space is the space which is allocated for OS level operations. Reserved space can be configured using the parameter dfs.datanode.du.reserved which can be added/updated from hdfs-site.xml. Replication factor is irrelevant in the case of Configured Capacity.

2. Present Capacity

It is the total amount of storage space which is actually available for storing the files after allocating some space for metadata and open-blocks (Non DFS Used space). So, the difference of Configured Capacity and Present Capacity is used for storing file system metadata and other information. When DataNodes sends report to the NameNode, it also has a Present Capacity parameter which is sent to the NameNode for the NameNode to track it and aggregate it from all the DataNodes, which gets displayed when hdfs dfsadmin -report command is run. Thus, Present Capacity may vary and it depends on the usage of other Non-HDFS directories, however, Configured Capacity remains same until you add/remove volume/disks from the HDFS.

3. DFS Used

It is the storage space that has been used up by HDFS. In order to get the actual size of the files stored in HDFS, divide the 'DFS Used' by the replication factor. The replication factor can be found in the hdfs-site.xml config file configured under dfs.replication parameter. So if the DFS Used is 90 GB, and your replication factor is 3, the actual size of your files in HDFS will be 90/3 = 30 GB.

4. DFS Remaining

It is the amount of storage space still available to the HDFS to store more files. If you have 90 GB remaining storage space, that mean you can still store up to 90/3 = 30 GB of files without exceeding your Configured Capacity and assuming replication factor is 3. So after understanding DFS Used and DFS Remaining we can say that:

Present Capacity = DFS Used + DFS Remaining

5. Non DFS Used

Non DFS used is any data in the filesystem of the data node(s) that isn't in \dfs.datanode.data.dir. The term 'Non DFS Used' means that "How much of Configured Capacity is being occupied for Non DFS Use".

Non DFS Used = Configured Capacity - DFS Remaining - DFS Used

VALIDATING THE OUTPUT

Present Capacity = Sum of [ DFS Used + DFS Remaining ] for all the Data Nodes

In the output shared above after running the command, we have 4 DataNode

Present Capacity = [ 40KB + 13.12 GB ] + [ 14.51 GB + 41.89 GB ] + [ 14.51 GB + 33.77 GB ] + [ 14.51 GB + 44.86 GB ]

= 177.18 GB

This is what we got when we ran the command.

Configured Capacity = Sum of Configured Capacity for all the Data Nodes

= 18.69 GB + 77.62 GB + 77.62 GB + 77.62 GB

= 251.55 GB

Another way for checking the Configured Capacity is,

Configured Capacity = Present Capacity + Non DFS Used on all the Data Nodes

= 177.18 GB + [ 5.56 GB + 21.21 GB + 29.33 GB + 18.25 GB ]

= 251.53 GB

41,407 Views
Comments
avatar
Guru

Nice work @Arpit Khare ! This is going to be quite useful for the folks around. Thank you and keep it up !!