Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!
Expert Contributor

In this article, we will see how to perform the mirroring of Hive data and metadata using Falcon from source cluster to destination cluster. This article is based on HDP 2.5.

  • Configure Hive

Configure source and target Hive by clicking “Hive” from the Ambari Services menu, then click “configs” to add following custom properties on Ambari UI by scroll down to “Custom hive-site”, click it and then click “Add Property”.

Add following property name with value:

hive.metastore.event.listeners = org.apache.hive.hcatalog.listener.DbNotificationListener = true

Press OK to save the changes, then click Restart all the impacted services.

  • Bootstrap Table and DB

Before creating Hive DR mirroring job to replicate Hive data/metadata for a DB or Table, it is required to perform initial bootstrap of Table and DB from source to target cluster.

Table Bootstrap

For bootstrapping table replication, do an EXPORT of the table in question at source cluster, distcp the export directory to the target cluster, and do an IMPORT at target cluster. Export-Import can be seen here :

For example create the table global_sales and insert records:

	hive > 	create table global_sales
	(customer_id string, item_id string, quantity float, price float, time timestamp) 
	partitioned by (country string);

	hive > insert into table global_sales partition (country = 'us') values ('c1', 'i1', '1', 	'1', '2001-01-01 01:01:01');

Start Bootstrap

	## On source cluster :
	hive > export table global_sales to '/user/ambari-qa/export_sql'; 

	$ hadoop distcp hdfs://machine-1-1.openstacklocal:8020/user/ambari-qa/export_sql 

	## On target cluster :
	hive > import table global_sales from '/user/ambari-qa/import_sql';

Above steps will set up the target table in sync with source table so that the events on the source cluster that modify the table will then be replicated over.

Database Bootstrap

For bootstrapping DB replication, first target DB must be created. This step is expected because DB replication definitions can be set up only on pre-existing DB by users.

Second, we need to export all tables in the source D

B and import them in the target DB, as described in Table bootstrap.

  • Set up source and target cluster staging/working directory

Source cluster:

[root@machine-1-1 ~]# su - falcon

hadoop fs -mkdir -p /apps/falcon/primaryCluster/staging

hadoop fs -mkdir -p /apps/falcon/primaryCluster/working

hadoop fs -chmod 777 /apps/falcon/primaryCluster/staging

Target cluster:

[root@machine-2-1 ~]# su - falcon

hadoop fs -mkdir -p /apps/falcon/backupCluster/staging

hadoop fs -mkdir -p /apps/falcon/backupCluster/working

hadoop fs -chmod 777 /apps/falcon/backupCluster/staging

  • Create cluster entity

Navigate to Falcon UI from Ambari services menu and create source cluster entity using Falcon UI by clicking “Create” -> “Cluster”

Screen Shot 2016-09-06 at 4.53.47 PM.png

Screen Shot 2016-09-06 at 4.54.17 PM.png

Save the source cluster entity by clicking “Next”->”Save” .

Create target cluster entity using Falcon UI by clicking “Create” -> “Cluster”

Screen Shot 2016-09-06 at 4.57.17 PM.png

Screen Shot 2016-09-06 at 4.57.44 PM.png

Save the cluster entity by clicking “Next”->”Save” .

  • Insert records in source Hive server for replication.

Insert some records in source Hive server to replicate to target Hive server.

	hive > insert into table global_sales partition (country = 'uk') values ('c2', 'i2', '2', 	'2', '2001-01-01 01:01:02');
  • Prepare and submit Hive DR Mirroring

To submit the Hive DR mirroring job, click “Create”->”Mirror”->”Hive” and then fill the required values.

Screen Shot 2016-09-06 at 5.00.11 PM.png

Screen Shot 2016-09-06 at 5.00.47 PM.png

Screen Shot 2016-09-06 at 5.01.06 PM.png

Click Next -> Save the Hive DR mirror job.

  • Submit and Schedule HiveDR

Screen Shot 2016-09-06 at 4.34.31 PM.png

Screen Shot 2016-09-06 at 4.35.02 PM.png

  • Check output

Once scheduled Hive DR process completed (checked from Oozie UI), verify the target Hive server for output.

Earlier, we inserted two records at source Hive server and now at target Hive server both records are available.

Screen Shot 2016-09-06 at 5.14.31 PM.png



Falcon requires hive bootstrapping export method.

hive -e “EXPORT TABLE TABLE_NAME TO ‘hdfs://BACKUP_CLUSTER:8020/hiveimport/' FOR replication('bootstrapping’)”

Expert Contributor

Although HiveDR might work on HDP 2.5.0, HDP 2.5.3 or higher is recommended, due to updates and bug fixes. There are also some considerations when using HiveDR with Falcon. See Considerations for Using Falcon and Mirroring Data with HiveDR in a Secure Environment.