Support Questions

Find answers, ask questions, and share your expertise

Migrating data off an old CDH cluster to a new HDP cluster

avatar
Contributor

I have an old CDH (unsecured) cluster (5.4.3) that I am trying to migrate a directory of data from to a new HDP (2.4.x) cluster. Webhdfs has been disabled on the source (unsecured) cluster. So far I have tried distcp, however due to the mismatched versions, and the secure/unsecure issues, hdfs:// > hdfs:// does not work, hftp > hdfs does not work and webhdfs:// > webhdfs:// cannot work. (HDFS-7037 HDFS-6776)

I also tried sqooping the files from Hive(unsecured cluster) to HDFS (secured cluster) but version differences again prevent that avenue. (HIVE-6050)

Outside of writing a script to execute on the unsecured cluster, and pushing data through the rest api on the secured cluster's version of webhdfs I can't think of a faster way. Before I do that I figured I'd ask if anyone else had a better idea?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Erin Fusaro

Have you considered using NiFi to transfer the data? The FetchHDFS process would be configured to communicate to the unsecure cluster and the PutHDFS processor would be configured to communicate with the secure cluster.

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

Could you please explain more why distcp hftp -> hdfs didn't work? This is the recommended way to copy between different versions of hadoop. And it should be running on destination cluster.

avatar
Contributor

If you check out the Jira tickets the issue revolves around needing to execute from the secure cluster and read from the insecure cluster. https://issues.apache.org/jira/browse/HDFS-7037

avatar
Super Guru

@Erin Fusaro

Have you considered using NiFi to transfer the data? The FetchHDFS process would be configured to communicate to the unsecure cluster and the PutHDFS processor would be configured to communicate with the secure cluster.

avatar
Contributor

Great suggestion! I may try that first.