Member since
11-05-2020
1
Post
0
Kudos Received
0
Solutions
11-09-2020
09:06 AM
Hello @AlexP Ref: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#setrep Referring to HDFS document, answers to your questions are inline. [Q1.] How to estimate how much time would this command take for a single directory (without -w)? [A1.] It depends upon the numbr of files in the directory. If you are running setrep against a path which is a directory, then the command recursively changes the replication factor of all files under the directory tree rooted at path. The time varies dependsing on the file count under the path/directory. [Q2.] Will it trigger a replication job even if I don't use the '-w' flag? [A2.] Yes, replication will trigger without -w flag. However, it is good practice to use -w to ensure all files are having required replication factor set prior to command exits. Please note, the -w flag requests that the command wait for the replication to complete. Though use of -w potentially takes a long time to complete the command but it gurantees the replication factor changed to the specified value. [Q3.] If yes, does it mean that the NameNode will actually start deleting 'over-replicated' blocks of all existing files under a particular directory? [A3.] Yes, your understanding is correct. The additonal 1 replica of the block will mark the block as over-replicated and same will be deleted from cluster. This action will be performed for each files under the directory path keeping only 2 replicas of the file blocks. Hope this helps.
... View more