Created 01-17-2021 09:19 AM
we installed small HDP cluster with one data-node machine
HDP version is `2.6.5` and ambari version is `2.6.1`
so this is new cluster that contain two name-node with only one data-node ( worker machine )
the interesting behavior that we see is that increasing of `under replica` on ambari dashboard , for now the number is `15000` under replica blocks
as we know the most root cause of this problem is network issues between name node to data-node
but this isn't the case in our hadoop cluster
we can also decrease the under replica by the following procedure
su - <$hdfs_user>
bash-4.1$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files
-bash-4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done
but we not want to do it because under replica problem should not happens from beginning
and maybe need to tune some HDFS parameters , but we not sure about this
please let us know about any advice that can help us
Created 01-26-2021 03:16 PM
It seems to me like this is a symptom of having the default replication set to 3. This is for redundancy and processing capability within HDFS. It is recommended to have minimum 3 data nodes in the cluster to accommodate 3 healthy replicas of a block (as we have a default replication of 3). HDFS will not write replicas of the same blocks to the same data node. In your scenario there will be under replicated blocks and 1 healthy replica will be placed on the available data node.
You may run setrep [1] to change the replication factor. If you provide a path to a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.
hdfs dfs -setrep -w 1 /user/hadoop/dir1
[1] https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#setrep
Created 01-26-2021 03:16 PM
It seems to me like this is a symptom of having the default replication set to 3. This is for redundancy and processing capability within HDFS. It is recommended to have minimum 3 data nodes in the cluster to accommodate 3 healthy replicas of a block (as we have a default replication of 3). HDFS will not write replicas of the same blocks to the same data node. In your scenario there will be under replicated blocks and 1 healthy replica will be placed on the available data node.
You may run setrep [1] to change the replication factor. If you provide a path to a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.
hdfs dfs -setrep -w 1 /user/hadoop/dir1
[1] https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#setrep