Created 12-22-2016 08:14 PM
I have an old CDH (unsecured) cluster (5.4.3) that I am trying to migrate a directory of data from to a new HDP (2.4.x) cluster. Webhdfs has been disabled on the source (unsecured) cluster. So far I have tried distcp, however due to the mismatched versions, and the secure/unsecure issues, hdfs:// > hdfs:// does not work, hftp > hdfs does not work and webhdfs:// > webhdfs:// cannot work. (HDFS-7037 HDFS-6776)
I also tried sqooping the files from Hive(unsecured cluster) to HDFS (secured cluster) but version differences again prevent that avenue. (HIVE-6050)
Outside of writing a script to execute on the unsecured cluster, and pushing data through the rest api on the secured cluster's version of webhdfs I can't think of a faster way. Before I do that I figured I'd ask if anyone else had a better idea?
Created 12-23-2016 06:11 PM
Have you considered using NiFi to transfer the data? The FetchHDFS process would be configured to communicate to the unsecure cluster and the PutHDFS processor would be configured to communicate with the secure cluster.
Created 12-23-2016 01:12 AM
Could you please explain more why distcp hftp -> hdfs didn't work? This is the recommended way to copy between different versions of hadoop. And it should be running on destination cluster.
Created 12-23-2016 01:52 PM
If you check out the Jira tickets the issue revolves around needing to execute from the secure cluster and read from the insecure cluster. https://issues.apache.org/jira/browse/HDFS-7037
Created 12-23-2016 06:11 PM
Have you considered using NiFi to transfer the data? The FetchHDFS process would be configured to communicate to the unsecure cluster and the PutHDFS processor would be configured to communicate with the secure cluster.
Created 12-23-2016 11:39 PM
Great suggestion! I may try that first.