Hello All.
What problems can there be when copying data between two clusters with different major versions if you use hdfs://... instead of webhdfs://...
hadoop distcp hdfs://<namenode>:<port> hdfs://<namenode>
Examle from documetntation -
Copying between major versions
Run the distcp command on the cluster that runs the higher version of Cloudera, which should be the destination cluster. Use the following syntax:
hadoop distcp webhdfs://<namenode>:<port> hdfs://<namenode>
Note the webhdfs prefix for the remote cluster, which should be your source cluster. You must use webhdfs when the clusters run different major versions. When clusters run the same version, you can use the hdfs protocol for better performance.
For example, the following command copies data from a Cloudera source cluster named example-source to another Cloudera version destination cluster named example-dest:
hadoop distcp webhdfs://example-source.cloudera.com:8020 hdfs://example-dest.cloudera.com