we have production HDP 2.6.4 cluster , with 12 data nodes machines
from ambari we can see ~25000 number of replica
when this happens then we fix the under replica
but after some time its return again
what we need to check in order to find the root cause for this behavior
It can happen due to multiple reason. Quoting from https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased
Are you noticing any of these "dataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail" prior to under replicated state?