Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

How to do I copy data from one HDFS to another HDFS?

Explorer
3favorite
2

I have two HDFS setup and want to copy (not migrate or move) some tables from HDFS1 to HDFS2. How to do I copy data from one HDFS to another HDFS? Other than using sqoop or discp options.

4 REPLIES 4

@JAYA PARASU

Your only option outside of distcp and recreating the tables on the other cluster is to use Falcon.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-movement-and-integration/content/ch...

It still uses distcp in the background though, but that is transparent to the user.

Please be advised that starting HDP 2.6, Falcon has been deprecated and will be completely removed from the stack starting HDP 3

As always, if you find this post helpful, don't forget to "accept" answer.

@JAYA PARASU

There's currently no substitute to Falcon or distcp within the platform. Expect a solution in the near future that will replace the deprecated Falcon.

Having said that, I would suggest you take the distcp and recreating/copying the Hive DDL/tables route rather than investing effort into setting up Falcon.

Explorer

Thank you Eyad, we are using HDP 2.6, do we have any other option other than Falcon. Since Falcon has been deprecated with HDP 2.6.

@JAYA PARASU

You could also try taking a HDFS snapshot:

https://hortonworks.com/blog/protecting-your-enterprise-data-with-hdfs-snapshots/

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html

You can setup a cron job that takes the snapshot and does the copy on a regular basis.