Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive backup using Falcon

avatar
New Contributor

Hi,

I'm just wondering if someone could clarify if my thinking is right. We are looking to use a 4 node Hadoop cluster as a backup for HDFS and HIVE. We plan to use Falcon to perform an HDFS mirror between the clusters and also Mirror the HIVE data between them.

Am I right in thinking that before we do this we will need to manually do a MYSQL dump from the primary cluster and restore it to the Backup cluster so that when the falcon Hive Mirror runs everything is ported across successfully.

The Mirror process doesn't copy across any new tables etc is there any recommendations to handle this automatically apart from scripting a MYSQL export import to run periodically.

Thanks in Advance

David

1 ACCEPTED SOLUTION

avatar
Rising Star

Before setting up Falcon Mirroring, the Hive databases and tables should be "seeded" to the backup cluster using Hive's export/import table. Dumping and seeding the MySQL repository is not a recommended method. When setting up the Mirror in Falcon, it is best to mirror entire databases, not individual tables. That way any new tables created on the source cluster will automatically be mirrored onto the backup cluster. We recommend running the Mirror on the backup (target) cluster so the mirroring workload has less impact on your production cluster.

View solution in original post

2 REPLIES 2

avatar
Rising Star

Before setting up Falcon Mirroring, the Hive databases and tables should be "seeded" to the backup cluster using Hive's export/import table. Dumping and seeding the MySQL repository is not a recommended method. When setting up the Mirror in Falcon, it is best to mirror entire databases, not individual tables. That way any new tables created on the source cluster will automatically be mirrored onto the backup cluster. We recommend running the Mirror on the backup (target) cluster so the mirroring workload has less impact on your production cluster.

avatar
New Contributor

Thanks for that, I'm glad I asked. As I find the documentation on mirroring HIVE is a bit patchy, no doubt I'm just looking at the wrong sites. Thanks again David