Created on 09-03-2018 05:05 PM - edited 08-18-2019 02:05 AM
hi all
we have ambari cluster version 2.6.1 & HDP version 2.6.4
from the dashboard we can see that HDFS DISK Usage is almost 90%
but all data-node disk are around 90%
so why HDFS show 90% , while datanode disk are only 50%
/dev/sdc 20G 11G 8.7G 56% /data/sdc /dev/sde 20G 11G 8.7G 56% /data/sde /dev/sdd 20G 11G 9.0G 55% /data/sdd /dev/sdb 20G 8.9G 11G 46% /data/sdb
is it problem of fine-tune ? or else
we also performed re-balance from the ambari GUI but this isn't help
Created on 09-05-2018 12:56 AM - edited 08-18-2019 02:05 AM
As the NameNode Report and UI (including ambari UI) shows that your DFS used is reaching almsot 87% to 90% hence it will be really good if you can increase the DFS capacity.
In order to understand in detail about the Non DFS Used = Configured Capacity - DFS Remaining - DFS Used
YOu can refer to the following article which aims at explaining the concepts of Configured Capacity, Present Capacity, DFS Used,DFS Remaining, Non DFS Used, in HDFS. The diagram below clearly explains these output space parameters assuming HDFS as a single disk.
https://community.hortonworks.com/articles/98936/details-of-the-output-hdfs-dfsadmin-report.html
.
The above is one of the best article to understand the DFS and Non-DFS calculations and remedy.
You add capacity by giving dfs.datanode.data.dir more mount points or directories. In Ambari that section of configs is I believe to the right depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure.
.
Created 09-04-2018 12:17 PM
@Geoffrey So the last thing that we can to do is to add disk on the datanode , or add data-node machine , what your opinion ?
Created 09-04-2018 03:13 PM
Hi,
HDFS is full one of reason is Replication Factor, You’re comparing the “before replication".
you have 4 data nodes. How much space does each of the 4 data nodes have ?
Created 09-04-2018 05:26 PM
@Ganesh
we have 4 datanode machines and each datanode have 4 disks , and each disk have 20G , let me know if this info that you want ?
Created 09-04-2018 07:35 PM
Total 320 GB , what is the replication setting 2 or 3 and check initially how many disks you used for hdfs( how much hdfs space created)
Created 09-04-2018 08:27 PM
the replication factor is 3 , and we use all 4 disks for HDFS ( means 80G ) , regarding to what you said - how much hdfs space created , please advice how to check it?
Created on 09-05-2018 12:56 AM - edited 08-18-2019 02:05 AM
As the NameNode Report and UI (including ambari UI) shows that your DFS used is reaching almsot 87% to 90% hence it will be really good if you can increase the DFS capacity.
In order to understand in detail about the Non DFS Used = Configured Capacity - DFS Remaining - DFS Used
YOu can refer to the following article which aims at explaining the concepts of Configured Capacity, Present Capacity, DFS Used,DFS Remaining, Non DFS Used, in HDFS. The diagram below clearly explains these output space parameters assuming HDFS as a single disk.
https://community.hortonworks.com/articles/98936/details-of-the-output-hdfs-dfsadmin-report.html
.
The above is one of the best article to understand the DFS and Non-DFS calculations and remedy.
You add capacity by giving dfs.datanode.data.dir more mount points or directories. In Ambari that section of configs is I believe to the right depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure.
.
Created 09-05-2018 05:53 AM
@Jay , just to be sure , when you said "increase the DFS capacity" , you actually mean to add disks / add capacity by giving dfs.datanode.data.dir more mount points , am I right
Created 09-05-2018 12:28 PM
@Jay , can you refer to Karthik Palanisamy quastion ?
Created on 09-05-2018 03:02 PM - edited 08-18-2019 02:04 AM
we configured 4 disk! , it is not the first time that we configured , and this is the same on all our lab cluster ,
please look on this , you can see clearly 4 disks !
Created 09-05-2018 03:16 PM
Did you restarted HDFS after adding disks?
It's hard to tell you exact cause without analysis namenode and datanode log.
If possible attach namenode and one of the datanode log which must be after service restart. We can validate the disk registration in HDFS.
Also /etc/hadoop/conf.