Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hdfs replication progress tracking

hdfs replication progress tracking

New Contributor

Is there a command to track progress on hdfs replication?

After changing replication factor from 1 to 3, i need to track progress on how long it is going to take. I need to either track progress or ideally - retrieve info how long did it take (post factum). This is to help optimize certain configurations so the testing/process will have to be repeated several times and perf metrics compared.

Thanks

2 REPLIES 2
Highlighted

Re: hdfs replication progress tracking

Super Mentor

@Anna Skobodzinski

In HDFS, the blocks of the files are distributed among the datanodes as per the replication factor.

If the Replication Factor is 1 then it means only on one DataNode the Data will be stored.

Whenever you add a new datanode, the new datanodes will also start receiving and storing the blocks of the *New Files*.


So just changing the replication factor will not cause rebalancing the cluster. Although the new files will be replicated based on the replication factor.



What is HDFS Rebalancer?

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/data-storage/content/balancing_data_across_...

The HDFS Balancer is a tool for balancing the data across the storage devices of a HDFS cluster. The HDFS balancer moves blocks until the cluster is deemed to be balanced, which means that the utilization of every DataNode (ratio of used space on the node to total capacity of the node) differs from the utilization of the cluster (ratio of used space on the cluster to total capacity of the cluster) by no more than a given threshold percentage.


The HDFS Rebalance operation can be either triggered via Ambari UI or via Command line:

Ambari UI --> HDFS --> Actions (Drop down) --> Rebalance HDFS (Enter) 
then specify the "Balancer threshold (percentage of disk capacity)"



Via Command line is described here:

https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/data-storage/content/balancing_data_across_...


You can check the "balancer.id" to see if it is going on (existing or not)?

# su - hdfs -c "hdfs dfs -cat /system/balancer.id"


Also you can check the "dfsadmin" report to find out the progress.

# hdfs dfsadmin -report > before_dfsadmin.log 

.

Highlighted

Re: hdfs replication progress tracking

The above question and the entire reply thread below was originally posted in the Community Help track. On Mon Jul 1 01:31 UTC 2019, a member of the HCC moderation staff moved it to the Hadoop Core track. The Community Help Track is intended for questions about using the HCC site itself, not technical questions about HDFS replication.

Bill Brooks, Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Don't have an account?
Coming from Hortonworks? Activate your account here