Support Questions

Find answers, ask questions, and share your expertise

Move data inter datanodes

avatar

Hello Guys,

My question is regarding to move data inter clusters, if I need to migrate the data from minor version of HDP to a higher version.

If I update the version of HDFS in old datanodes and include those datanodes in new version, the HDFS will are able to see the files that were allocated in old version ? Do I need to execute the rebalance if it works ?

or the best option is using the DISTCP and move data inter cluster versions ?

Rgs,

R.R.

1 ACCEPTED SOLUTION

avatar
Super Guru
@Rodrigo Rondena

So you want to distcp data from one cluster to another cluster and the two clusters are on different versions. Is that right? Are they on same major version, for example, 2.x and 2.y? In this case you just simply use hdfs protocol. To distcp between two different major versions, use webhdfs protocol. Check the following link:

https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html#Copying_Between_Versions_of_HDFS

View solution in original post

1 REPLY 1

avatar
Super Guru
@Rodrigo Rondena

So you want to distcp data from one cluster to another cluster and the two clusters are on different versions. Is that right? Are they on same major version, for example, 2.x and 2.y? In this case you just simply use hdfs protocol. To distcp between two different major versions, use webhdfs protocol. Check the following link:

https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html#Copying_Between_Versions_of_HDFS