Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to copy data between two hadoop clusters?

avatar

I have 2 clusters Hadoop, I want copy the data of cluster 1 in cluster 2, I searched on articles, forums... which tools I should use to copy this data. I found that I can use Falcon, but I do not understand how can use it. Someone please can help me, by a guide, article or a practical work explain me how can I do this workflow by Falcon ?

2 REPLIES 2

avatar
New Contributor

avatar
Master Mentor

@venkata ramireddy

DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.

Copying Data from Cluster1 to Cluster2

hadoop distcp hdfs://cluster1:8020/data/in/hdfs/ hdfs://cluster2:8020/new/path/in/hdfs/ 

Copying between 2 HA clusters

Using distcp between two HA clusters would be to identify the current active NameNode and run distcp like you would with two clusters without HA:

hadoop distcp hdfs://active1:8020/path hdfs://active2:8020/path

Here is a documentation from Apache