I am getting alert on HDFS hdfs-storage-alert.jpg . Its saying
Remaining Capacity:, Total Capacity:[82% Used, 46121660928]
The used capacity reported as 82% but the "df -h" command shows 68% used and the hdfs dfs -du -h shows 5% used ?
why all these discrepancies ?
[hdfs@hadoop1 ~]$ hdfs dfs -df -h Filesystem Size Used Available Use% hdfs://hadoop1.tolls.dot.state.fl.us:8020 214.8 G 11.0 G 93.0 G 5% [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ [root@hadoop1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_hadoop1-lv_root 50G 32G 16G 68% / tmpfs 5.9G 0 5.9G 0% /dev/shm /dev/sda1 477M 72M 381M 16% /boot /dev/mapper/vg_hadoop1-lv_home 144G 361M 136G 1% /home
hdfs dfs -df will report on the entire cluster storage, not one data node. On the other hand, df -h is only going to report on only one node, and not the cluster. So, they will not match. I can't explain the discrepancy in the 68% vs the 82%.
You may need to rebalance: https://docs.hortonworks.com/HDPDocuments/Ambari-184.108.40.206/bk_Ambari_Users_Guide/content/_how_to_rebal...
Can you please check if some data is still present in the "/user/hdfs/.Trash"
You might get more details about "hdfs dfs -expunge" and "-skipTrash" option. As per
When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS moves it to a trash directory (each user has its own trash directory under /user/<username>/.Trash). The file can be restored quickly as long as it remains in trash. Most recent deleted files are moved to the current trash directory (/user/<username>/.Trash/Current), and in a configurable interval, HDFS creates checkpoints (under /user/<username>/.Trash/<date>) for files in current trash directory and deletes old checkpoints when they are expired. After the expiry of its life in trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS. Currently, the trash feature is disabled by default (deleting files without storing in trash). User can enable this feature by setting a value greater than zero for parameter fs.trash.interval (in core-site.xml). This value tells the NameNode how long a checkpoint will be expired and removed from HDFS. In addition, user can configure an appropriate time to tell NameNode how often to create checkpoints in trash (the parameter stored as fs.trash.checkpoint.interval in core-site.xml), this value should be smaller or equal to fs.trash.interval.