Support Questions

Find answers, ask questions, and share your expertise

HDFS is almost full 90% but data node disks are around 50%

avatar

87531-capture.png

hi all

we have ambari cluster version 2.6.1 & HDP version 2.6.4

from the dashboard we can see that HDFS DISK Usage is almost 90%

but all data-node disk are around 90%

so why HDFS show 90% , while datanode disk are only 50%

/dev/sdc                   20G   11G  8.7G  56% /data/sdc
/dev/sde                   20G   11G  8.7G  56% /data/sde
/dev/sdd                   20G   11G  9.0G  55% /data/sdd
/dev/sdb                   20G  8.9G   11G  46% /data/sdb

is it problem of fine-tune ? or else

we also performed re-balance from the ambari GUI but this isn't help

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

As the NameNode Report and UI (including ambari UI) shows that your DFS used is reaching almsot 87% to 90% hence it will be really good if you can increase the DFS capacity.

In order to understand in detail about the Non DFS Used = Configured Capacity - DFS Remaining - DFS Used

YOu can refer to the following article which aims at explaining the concepts of Configured Capacity, Present Capacity, DFS Used,DFS Remaining, Non DFS Used, in HDFS. The diagram below clearly explains these output space parameters assuming HDFS as a single disk.

https://community.hortonworks.com/articles/98936/details-of-the-output-hdfs-dfsadmin-report.html

88475-hdfs-diagram.jpg

.

The above is one of the best article to understand the DFS and Non-DFS calculations and remedy.

You add capacity by giving dfs.datanode.data.dir more mount points or directories. In Ambari that section of configs is I believe to the right depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure.

.

View solution in original post

53 REPLIES 53

avatar
Master Mentor

@Michael Bronson

The HDFS dashboard metrics widget "HDFS Disk Usage" shows: \The percentage of distributed file system (DFS) used, which is a combination of DFS and non-DFS used.

So can you just put your mouse over the "HDFS Disk Usage" widget and then see what is the different values do you see there for "DFS Used" , "non DFS Used" and Remaining. You should see something like following:

88468-hdfs-disk-usages.png

.

avatar

@Jay , this is what we got: ( so this is like 88% used ) , and regarding my question , how it can be 88% when disk are ~50%

87536-capture.png

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

What do you see when you run the following command?

# su - hdfs -c " hdfs dfsadmin -report | grep 'DFS Used%'

(OR)

Please check the "DFS Used" shown in the NameNode UI to verify if ambari is shoiwng the same data or different? : http://$ACTIVE_NAMENODE:50070/dfshealth.html#tab-overview

avatar

@Jay , I want to note that we have 4 datenode machines ( 4 workers machines ) and each worker have 4 disks with 20G

Michael-Bronson

avatar

@jay , we got the following results


su - hdfs -c " hdfs dfsadmin -report | grep 'DFS Used%' " DFS Used%: 87.38% DFS Used%: 88.48% DFS Used%: 87.00% DFS Used%: 84.70% DFS Used%: 87.93%
Michael-Bronson

avatar

@Jay another info from my datanode about the disk use



datanode1
47% disk1
64% disk2
48% disk3
49% disk4


datanode2
44%
53%
61%
44%


datanode3
55%
46%
45%
91%


datanode4
63%
45%
49%
46%

Michael-Bronson

avatar

@Jay do you need other info?

Michael-Bronson

avatar

avatar
Master Mentor

@Michael Bronson

As "HDFS Disk Usage" shows: The percentage of distributed file system (DFS) used, which is a combination of DFS and non-DFS used.

The NameNode commands/UI shows that the DFS Used is around 87.06% and Non DFS Used is 0%

So which is almost same which ambari is showing like almost 88% (DFS + Non DFS Usage) so there seems to be no contradiction to me.

Please let us know what is the value you are expecting.