When I copy a file from one cluster to another using distcp, it preserves the replication-factor and block-size by default. For example:-
When I copy a file from Cluster A with replication-factor 2 and block-size 64MB to Cluster B with default replication-factor 3 and block-size 128MB, the file's replication-factor and block-size are the same as in case of Cluster A, i.e, 2 and 64MB. I want the file to get the default values of Cluster B.
How can I update the replication-factor and block-size of the file as per the destination cluster?
(Both the Clusters are running HDP-2.6.x)
Hey @Himanshu Rawat!
I made a test here, and guess you can use -D dfs.replication to your distpc.
Here's the following command:
[hdfs@node4 ~]$ hadoop distcp -D dfs.replication=1 /largefile.img webhdfs://<MyOtherHDP>:50070/
Hope this helps!