Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

what are the various options other than distcp to copy data(hive tables and hdfs data) from 1 hadoop cluster to another

avatar
Contributor

What different options are available other than distcp to copy data including hive metadta or hive tables and hdfs data between two hdp clusters?

3 REPLIES 3

avatar

Hello @tauqeer khan.
If you're using HDP, then you can give it a shot with Falcon:

https://community.hortonworks.com/articles/110398/mirroring-datasets-between-hadoop-clusters-with-ap...

The other thing would be to check the following links:
https://cwiki.apache.org/confluence/display/Hive/Replication
https://medium.com/@anishekagarwal/aapache-hive-introduction-to-replication-v2-2e12edcbeec

And lastly, you want to replicate Hive and you don't wanna use distcp or none of the solutions listed above, you can try to use the following apache project from AirBnb (i've used once, it's pretty cool )

https://github.com/airbnb/reair

Hope this helps!

avatar
Contributor

Thanks for the answer @Vinicius Higa Murakami .

Can we use Nifi ?

avatar

Hmm, @tauqeer khan good question.
I've never tested it myself.
But at first, glance, guess you can take a look at the following answer:
https://community.hortonworks.com/questions/182344/how-to-copy-data-from-a-hive-table-recurrently-us...
And try to build something similar 🙂
BTW, it's a good idea to test it, let us know if it works or if you face anything.
Hope this helps!