Created on 09-03-2018 05:05 PM - edited 08-18-2019 02:05 AM
hi all
we have ambari cluster version 2.6.1 & HDP version 2.6.4
from the dashboard we can see that HDFS DISK Usage is almost 90%
but all data-node disk are around 90%
so why HDFS show 90% , while datanode disk are only 50%
/dev/sdc 20G 11G 8.7G 56% /data/sdc /dev/sde 20G 11G 8.7G 56% /data/sde /dev/sdd 20G 11G 9.0G 55% /data/sdd /dev/sdb 20G 8.9G 11G 46% /data/sdb
is it problem of fine-tune ? or else
we also performed re-balance from the ambari GUI but this isn't help
Created on 09-05-2018 12:56 AM - edited 08-18-2019 02:05 AM
As the NameNode Report and UI (including ambari UI) shows that your DFS used is reaching almsot 87% to 90% hence it will be really good if you can increase the DFS capacity.
In order to understand in detail about the Non DFS Used = Configured Capacity - DFS Remaining - DFS Used
YOu can refer to the following article which aims at explaining the concepts of Configured Capacity, Present Capacity, DFS Used,DFS Remaining, Non DFS Used, in HDFS. The diagram below clearly explains these output space parameters assuming HDFS as a single disk.
https://community.hortonworks.com/articles/98936/details-of-the-output-hdfs-dfsadmin-report.html
.
The above is one of the best article to understand the DFS and Non-DFS calculations and remedy.
You add capacity by giving dfs.datanode.data.dir more mount points or directories. In Ambari that section of configs is I believe to the right depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure.
.
Created 09-05-2018 03:55 PM
yes we restart the HDFS , it is auto installation , and all lab with that ,
Created 09-05-2018 04:26 PM
Ok. Share you namenode and one of the datanode log after service restart.
Did you validate in the local machine, hdfs-site.xml without ambari?
# grep dfs.datanode.data.dir -A1 /etc/hadoop/conf/hdfs-site.xml
Created 09-05-2018 09:07 AM
Once you login to Ambari, Under HDFS Summary you can find.
Disk Remaining:
How much disk space actually configured and how much remaining.
Created 09-05-2018 10:46 AM
remaining is only 18G
Created 09-05-2018 11:25 AM
there you can see how much actually configured and remaining.
Created 09-05-2018 12:11 PM
yes we know that , as you can see from all thread , the concoction is to add disks to each datanode , so no other solution exept this
Created 09-05-2018 09:51 AM
134+18=152 GB is your total configured capacity. It is not 320 GB. Please confirm if all volumes (/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde) are added in "dfs.datanode.data.dir" (hdfs-site.xml) to sum up 320 GB as configured capacity.
Created 09-05-2018 12:25 PM
yes dfs.datanode.data.dir are - /data/sdb,/data/sdc,/data/sdd,/data/sde , and hdfs-site.xml is also with all right configuration
Created 09-05-2018 02:11 PM
Ambari clearly shows, total configured capacity is 152 GB. But you need to double check from Namenode UI.
Share namenode screenshot,
http://<active namenode host>:50070/dfshealth.html#tab-overview
http://<active namenode host>:50070/dfshealth.html#tab-datanode
http://<active namenode host>:50070/dfshealth.html#tab-datanode-volume-failures
Also, attach active namenode log and one of the datanode log after service restart. We have to find what are the disks are getting registered during startup.
Can you get /etc/hadoop/conf/hdfs-site.xml?
Created on 09-05-2018 12:36 PM - edited 08-18-2019 02:05 AM
this is example what we have on the datanode machine , you can see that used are around 47-61% , and each disk is 20G size