I would like to know if there is any other solution to Hive metadata replication across clusters other than Apache Falcon Hive Mirror.
1) How does Hive Mirroring by Falcon work internally (On a High level)?
2) Can the same be achieved by backup and restore of metastore DB on a different server?
3) How can HDFS storage dependencies for tables be managed in case of Metastore DB backup and restore?
Any help on the above questions is much appreciated.
Thanks & Regards,
Hi @Megh Vidani
Data Plane Service (DPS) and Data Lifecycle Manager (DLM) are new products announced by Hortonworks for Disaster Recovery and Backup. Replication will support HDFS data as well as Hive data and metadata.
This product will be available very soon.
For your information, DLM uses the new event base feature of Hive that you can read on here :
I hope this helps
You can always use the internal Hive mechanism with your own script for replication. For instance, Hive export store data and metadata on HDFS. You can use distcp to copy these data from one cluster to another. The new Hive replication features are also something to consider with an updated Hive version.
Hive export and import is working fine with normal tables. In case of partitioned and bucketed ORC tables, we are facing issues as there are further delta directories. Somehow Hive is not able to detect the data present in the table after import.