Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

HDFS size in ambari not show the right usage

from HDFS user I do the following:

$ hdfs dfs -du -h /
512.5 M  /app-logs
48.8 M   /apps
4.2 M    /ats
695.6 M  /hdp
0        /mapred
56.0 G   /datag
0        /history
5.3 M    /spark2-history
0        /system
0        /tmp
465.6 M  /user

so all size seems to be around 60G

but from ambari dashboard

we see that ( DFS used 188G )

capture.png

so how it can be that in ambari dasboard we have used 188G ?


capture.png
Michael-Bronson
13 REPLIES 13

Explorer

@Michael Bronson can you show me the output of command "hdfs dfsadmin -report" ? Also from "hdfs dfs -df -h /" ?

2 things to consider:

First the replication factor must be accounted for. ( the correct space usage must be in the outputs above)

And second, to know the size consumed the best is to calculate the number of blocks used * block size.

What can happen is If you have a large amount of files that are smaller than the configured block size, you can be wasting some space. Imagine having a 128MB block space and writing a huge amount of 20MB files to the DFS. These files would be using only a fraction of the total block space. This is the problem with small files

this is the report


[hdfs@master02 root]$ hdfs dfsadmin -report Configured Capacity: 205428162560 (191.32 GB) Present Capacity: 204711643765 (190.65 GB) DFS Remaining: 2135187357 (1.99 GB) DFS Used: 202576456408 (188.66 GB) DFS Used%: 98.96% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (5): Name: 10.164.47.218:50010 (worker01.sys764.com) Hostname: worker01.sys764.com Decommission Status : Normal Configured Capacity: 41085632512 (38.26 GB) DFS Used: 40449814998 (37.67 GB) Non DFS Used: 0 (0 B) DFS Remaining: 476863314 (454.77 MB) DFS Used%: 98.45% DFS Remaining%: 1.16% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 4 Last contact: Tue Jun 26 05:32:13 UTC 2018 Name: 10.164.48.3:50010 (worker05.sys764.com) Hostname: worker05.sys764.com Decommission Status : Normal Configured Capacity: 41085632512 (38.26 GB) DFS Used: 40386752982 (37.61 GB) Non DFS Used: 0 (0 B) DFS Remaining: 551342871 (525.80 MB) DFS Used%: 98.30% DFS Remaining%: 1.34% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 4 Last contact: Tue Jun 26 05:32:13 UTC 2018 Name: 10.164.47.217:50010 (worker02.sys764.com) Hostname: worker02.sys764.com Decommission Status : Normal Configured Capacity: 41085632512 (38.26 GB) DFS Used: 40588859180 (37.80 GB) Non DFS Used: 0 (0 B) DFS Remaining: 347038972 (330.96 MB) DFS Used%: 98.79% DFS Remaining%: 0.84% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 2 Last contact: Tue Jun 26 05:32:13 UTC 2018 Name: 10.164.47.215:50010 (worker03.sys764.com) Hostname: worker03.sys764.com Decommission Status : Normal Configured Capacity: 41085632512 (38.26 GB) DFS Used: 40671485952 (37.88 GB) Non DFS Used: 0 (0 B) DFS Remaining: 346956888 (330.88 MB) DFS Used%: 98.99% DFS Remaining%: 0.84% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 2 Last contact: Tue Jun 26 05:32:15 UTC 2018 Name: 10.164.47.223:50010 (worker04.sys764.com) Hostname: worker04.sys764.com Decommission Status : Normal Configured Capacity: 41085632512 (38.26 GB) DFS Used: 40479543296 (37.70 GB) Non DFS Used: 0 (0 B) DFS Remaining: 412985312 (393.85 MB) DFS Used%: 98.52% DFS Remaining%: 1.01% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 4 Last contact: Tue Jun 26 05:32:13 UTC 2018
Michael-Bronson

Super Mentor

@Michael Bronson

Have you recently deleted large contents from your HDFS without using the " -skipTrash" option with "hdfs dfs rm" command?
Also what do you see for the DFS / Non DFS usage when you make the following JMX call in the browser?

http://$ACTIVE_NameNode_HOST:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo


Have you checked the "~/.Trash" directory of your HDFS users? like

# hdfs dfs -du -s -h /user/admin/.Trash

.

Have you tried doing expunge? Like

# su - hdfs
# hdfs dfs -rm /user/admin/.Trash/*
# hdfs dfs -expunge

.

hi Jay

I see that .Trash folder not exists , is it part of the problem ?

how to create it?

dfs -rm /user/admin/.Trash/*
rm: `/user/admin/.Trash/*': No such file or directory
[hdfs@master02 root]$ hdfs dfs -ls /user
Found 7 items
drwxr-xr-x   - root      hdfs          0 2018-01-04 15:16 /user/admin
drwxr-xr-x   - airflow   hdfs          0 2017-09-07 02:12 /user/airflow
drwxrwx---   - ambari-qa hdfs          0 2017-08-14 09:19 /user/ambari-qa
drwxr-xr-x   - hcat      hdfs          0 2017-08-14 09:19 /user/hcat
drwxr-xr-x   - hdfs      hdfs          0 2018-02-15 13:11 /user/hdfs
drwxr-xr-x   - hive      hdfs          0 2017-11-03 00:01 /user/hive
drwxrwxr-x   - spark     hdfs          0 2017-08-14 09:20 /user/spark
[hdfs@master02 root]$ hdfs dfs -ls /user/admin    (  admin folder exist )
Michael-Bronson

the output from http://$ACTIVE_NameNode_HOST:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo

<br>{
  "beans" : [ {
    "name" : "Hadoop:service=NameNode,name=NameNodeInfo",
    "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
    "Threads" : 132,
    "Total" : 205428162560,
    "UpgradeFinalized" : true,
    "ClusterId" : "CID-bc106817-c4d9-4b79-a9cf-5b1e37fa38c6",
    "Version" : "2.7.3.2.6.0.3-8, rc6befa0f1e911140cc815e0bab744a6517abddae",
    "Used" : 202576838656,
    "Free" : 2134805109,
    "Safemode" : "",
    "NonDfsUsedSpace" : 0,
    "PercentUsed" : 98.61201,
    "BlockPoolUsedSpace" : 202576838656,
    "PercentBlockPoolUsed" : 98.61201,
    "PercentRemaining" : 1.0391978,
    "CacheCapacity" : 0,
    "CacheUsed" : 0,
    "TotalBlocks" : 756250,
    "TotalFiles" : 32767,
    "NumberOfMissingBlocks" : 0,
    "NumberOfMissingBlocksWithReplicationFactorOne" : 0,
    "LiveNodes" : "{\"worker01.sys764.com:50010\":{\"infoAddr\":\"43.56.2.218:50075\",\"infoSecureAddr\":\"43.56.2.218:0\",\"xferaddr\":\"43.56.2.218:50010\",\"lastContact\":1,\"usedSpace\":40449945600,\"adminState\":\"In Service\",\"nonDfsUsedSpace\":0,\"capacity\":41085632512,\"numBlocks\":417031,\"version\":\"2.7.3.2.6.0.3-8\",\"used\":40449945600,\"remaining\":476732712,\"blockScheduled\":0,\"blockPoolUsed\":40449945600,\"blockPoolUsedPercent\":98.452774,\"volfails\":0},\"worker04.sys764.com:50010\":{\"infoAddr\":\"43.56.2.223:50075\",\"infoSecureAddr\":\"43.56.2.223:0\",\"xferaddr\":\"43.56.2.223:50010\",\"lastContact\":2,\"usedSpace\":40479584256,\"adminState\":\"In Service\",\"nonDfsUsedSpace\":0,\"capacity\":41085632512,\"numBlocks\":467826,\"version\":\"2.7.3.2.6.0.3-8\",\"used\":40479584256,\"remaining\":412944352,\"blockScheduled\":0,\"blockPoolUsed\":40479584256,\"blockPoolUsedPercent\":98.52492,\"volfails\":0},\"worker02.sys764.com:50010\":{\"infoAddr\":\"43.56.2.217:50075\",\"infoSecureAddr\":\"43.56.2.217:0\",\"xferaddr\":\"43.56.2.217:50010\",\"lastContact\":2,\"usedSpace\":40588972032,\"adminState\":\"In Service\",\"nonDfsUsedSpace\":0,\"capacity\":41085632512,\"numBlocks\":457048,\"version\":\"2.7.3.2.6.0.3-8\",\"used\":40588972032,\"remaining\":346926120,\"blockScheduled\":0,\"blockPoolUsed\":40588972032,\"blockPoolUsedPercent\":98.79116,\"volfails\":0},\"worker03.sys764.com:50010\":{\"infoAddr\":\"43.56.2.215:50075\",\"infoSecureAddr\":\"43.56.2.215:0\",\"xferaddr\":\"43.56.2.215:50010\",\"lastContact\":0,\"usedSpace\":40671485952,\"adminState\":\"In Service\",\"nonDfsUsedSpace\":0,\"capacity\":41085632512,\"numBlocks\":473548,\"version\":\"2.7.3.2.6.0.3-8\",\"used\":40671485952,\"remaining\":346956888,\"blockScheduled\":0,\"blockPoolUsed\":40671485952,\"blockPoolUsedPercent\":98.992,\"volfails\":0},\"worker05.sys764.com:50010\":{\"infoAddr\":\"10.164.48.3:50075\",\"infoSecureAddr\":\"10.164.48.3:0\",\"xferaddr\":\"10.164.48.3:50010\",\"lastContact\":2,\"usedSpace\":40386850816,\"adminState\":\"In Service\",\"nonDfsUsedSpace\":0,\"capacity\":41085632512,\"numBlocks\":453297,\"version\":\"2.7.3.2.6.0.3-8\",\"used\":40386850816,\"remaining\":551245037,\"blockScheduled\":0,\"blockPoolUsed\":40386850816,\"blockPoolUsedPercent\":98.29921,\"volfails\":0}}",
    "DeadNodes" : "{}",
    "DecomNodes" : "{}",
    "BlockPoolId" : "BP-1686071471-43.56.2.214-1502702329154",
    "NameDirStatuses" : "{\"active\":{\"/data/var/hadoop/hdfs/namenode\":\"IMAGE_AND_EDITS\"},\"failed\":{}}",
    "NodeUsage" : "{\"nodeUsage\":{\"min\":\"98.30%\",\"median\":\"98.52%\",\"max\":\"98.99%\",\"stdDev\":\"0.25%\"}}",
    "NameJournalStatus" : "[{\"manager\":\"QJM to [43.56.2.214:8485, 10.164.52.237:8485, 43.56.2.216:8485]\",\"stream\":\"open for read\",\"disabled\":\"false\",\"required\":\"true\"}]",
    "JournalTransactionInfo" : "{\"MostRecentCheckpointTxId\":\"81912212\",\"LastAppliedOrWrittenTxId\":\"81943698\"}",
    "NNStarted" : "Tue Jun 12 12:27:18 UTC 2018",
    "CompileInfo" : "2017-04-01T21:32Z by jenkins from (HEAD detached at c6befa0)",
    "CorruptFiles" : "[]",
    "DistinctVersionCount" : 1,
    "DistinctVersions" : [ {
      "key" : "2.7.3.2.6.0.3-8",
      "value" : 5
    } ],
    "SoftwareVersion" : "2.7.3.2.6.0.3-8",
    "RollingUpgradeStatus" : null
  } ]
}
Michael-Bronson

Super Mentor

@Michael Bronson

I was going through your "capture.png" image and noticed that
Non DFS Used = 0 % (which is strange) and looks similar to Ambari-22625
For the Non DFS usage that you see as 0% is actually looks like a Bug reported here: https://issues.apache.org/jira/browse/AMBARI-22625

Apart from non DFS usage other data seems to be matching what NameNode jmx and the screenshot returns: like:

Total : 205428162560 (205.42 GB)
Used :  202576838656 (202.57 GB)
"PercentUsed" : 98.61201 (98%)

@Jay for now HDFS is 99% , what we can do to deacrease it ?

Michael-Bronson

Super Mentor

@Michael Bronson

If you have almost 99% HDFS directory filled then better to try this:

1. Find top 10 HDFS directories using the script as shown here to see which HDFS directory is consuming how much space:

https://github.com/crazyadmins/useful-scripts/tree/master/hdfs

https://github.com/crazyadmins/useful-scripts/blob/master/hdfs/top_10_dir.sh

2. The delete some unwanted directory contents from the output listed in the above commands. and while removing the HDFS contents please use "-skipTrash" this time to make sure that the contents are not moved to the trash rather deleted permanently.

Example:

# hdfs dfs -rmr -skipTrash /unwanted/files<br>

.

we create the Trash folder as the following: ( from user HDFS )

hdfs dfs -mkdir /user/admin/.Trash
Michael-Bronson

we see only that from the script


Please wait while we calculate size to determine top 10 directories on HDFS | --------------------------- | ------- | ------------ | --------- | ---------- ------ | | Dir_on_HDFS | Size_in_MB | User | Group | Last_modified Time | | --------------------------- | ------- | ------------ | --------- | ---------- ------ | | /apps/hive/warehouse | 0 | hive | hadoop | 2018-05-31 09:04 | | /spark2-history/application_1527757840137_0005.inprogress | 0 | hive | hadoop | 2018-06-18 14:49 | | /spark2-history/application_1527757840137_0006.inprogress | 0 | hive | hadoop | 2018-06-18 14:49 | | /spark2-history/application_1527757840137_0003.inprogress | 0 | hive | hadoop | 2018-06-18 15:47 | | /user/hdfs/.hiveJars | 21 | hdfs | hdfs | 2018-06-06 08:10 | | /hdp/apps/2.6.4.0-91 | 710 | hdfs | hdfs | 2018-05-31 09:04 |
Michael-Bronson

and I dont think we can delete the - /hdp/apps/2.6.4.0-91

Michael-Bronson

Super Mentor

@Michael Bronson

We do not need to delete "/hdp/apps/2.6.4.0-91" directory as based on the output shared it is just consuming 738 MB which is normal and OK.

If you see some directories are growing unexpectedly and contains unwanted data then we can clear/delete them (not the useful one). Or the other option will be to increase the HDFS disk space.

@Jay , what is the procedure to increase the HDFS disk space , ?

Michael-Bronson
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.