Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

any normal way to stop Cloudera HDFS from complaining 'under_replicated ....'


any normal way to stop Cloudera HDFS from complaining 'under_replicated ....'


Recently I setup two cloudera hadoop clusters:


  1) a 3-nodes HDFS setup for production usage

  2) a single-node HDFS file system as a backup for the 3-node HDFS


The single-node HDFS' global replication factor is set to 1, while the 3-nodes src HDFS file system's replication factor is 3( at default).


Then I run snapshot+distcp solution to bakup my src HDFS to backup HDFS with commands below:


  hadoop distcp -pbugpca -update -delete -append hdfs://<src_hdfs>.../.snapshot/<#ss>/  hdfs://<backup_hdfs>/<bkup_dir>/

  hdfs dfs -setrep -R -w 1 /   ## under 'hdfs' account on backup HDFS node.


 The above two commands runs perfect. But the problem is, although I don't preserve 'replication factor' ( 'r' value to '-p' option) when doing distcp, and specifically set all files to factor 1, it still doesn't help  -- the Cloudera HDFS health check still reports the same 'under_replicated' type error.


The health test result for HDFS_UNDER_REPLICATED_BLOCKS  has become bad: 10,377 under replicated blocks in the cluster. 10,681 total blocks in the cluster. Percentage under replicated blocks: 97.15%. Critical threshold: 40.00%.


Any one familar with this issue, please feel free to shed a light. Thanks a lot.


Don't have an account?
Coming from Hortonworks? Activate your account here