I am getting a wired issue with namenode UI:
When I checked file and directory count from commandline it showing following:
hadoop fs -count /
dir_count file_count content_size
403103 4025766 253149817660 /
Total file and directory count : 403103+4025766 =4428869
Following are data showing in namenode URI(http://namenode:50070):
Summary Security is on.
Safemode is off.
30612012 files and directories, 28845146 blocks = 59457158 total filesystem object(s). Heap Memory used 8.11 GB of 9.88 GB Heap Memory. Max Heap Memory is 9.88 GB. Non Heap Memory used 74.74 MB of 146 MB Commited Non Heap Memory. Max Non Heap Memory is 304 MB.
Configured Capacity: 1.33 PB
DFS Used: 668.15 TB (49.17%)
Non DFS Used: 2.56 TB
DFS Remaining: 688.01 TB (50.64%)
Block Pool Used: 668.15 TB (49.17%)
DataNodes usages% (Min/Median/Max/stdDev): 43.25% / 48.96% / 53.42% / 2.11%
The count is not matching with commandline output. For this issue heap size is usage is very high.
Can any one has ever encountered same issue.
Seems like a pretty strange issue. Was this in an HA environment ?
Please note that the count will differ between the Active NN UI and StandBy NN UI.
Also, the other reason which might have been the case is a recent failover of Namenodes. In that case there might be a lag on the number of files.
This is expected if looking at the "standby" namenode in an HA NameNode environment.
Only the active NameNode will have an accurate count. If the standby becomes active it's count will be correct.
Hadoop >2.8 will report accurately. Read more here: https://issues.apache.org/jira/browse/HDFS-9396