Support Questions

Find answers, ask questions, and share your expertise
Welcome to the upgraded Community! Read this blog to see What’s New!

Falcon: ​Replication of hive partitioned tables is not working

Cloudera Employee

Replication of partitioned tables is not working, the behaviour seen is as follows.

Supposed we perform an operation such as a load table on the source cluster, when we run a cycle of Falcon replication the load is applied to the target cluster. When we run another cycle of replication the load seems to be re-applied to the target cluster for every subsequent replication cycle. This behaviour seems to happen for all operations on partitioned tables including drop partition. In the case of the drop partition the second time the operation is applied this leads to a failure as you are not able to perform the drop the second time. This behaviour leads to poor performance at times and also failures.

Is this behaviour normal? Is it a configuration issue? Any guidance for a workaround is appreciated.

Using Falcon 0.6.1.


Cloudera Employee

The Falcon issue is here:

The fix will appear in Falcon version 0.10 (now part of HDP 2.5).

View solution in original post


Rising Star
@Piotr Pruski

How are you setting up replication? Is it using Feed Replication or is it using Falcon recipe. Can you please do "falcon admin -version" and share the result. Can you also share the feed entity ?

Cloudera Employee


Using the Falcon Web UI Mirroring task.

As far as I know Falcon mirroring sat up with the UI is using Falcon recipe.

Falcon server build version: {"properties":[{"key":"Version","value":""},{"key":"Mode","value":"embedded"},{"key":"Hadoop","value":""},{"key":"authentication","value":"simple"}]}

Mirroring does not use Feed entity, it is using Process entity.

Here is an example of the unsuccessful Hive replication of a partitioned table:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="FalcontestMirror3" xmlns="uri:falcon:process:0.1">
       <cluster name="sca61hive5">
           <validity start="2016-05-18T14:24Z" end="2019-01-31T15:24Z"/>
       <property name="" value="true"/>
       <property name="distcpMaxMaps" value="1"/>
       <property name="distcpMapBandwidth" value="100"/>
       <property name="targetCluster" value="sca60hive4"/>
       <property name="sourceCluster" value="sca61hive5"/>
       <property name="targetHiveServer2Uri" value="hive2://sca60.local:10000"/>
       <property name="sourceHiveServer2Uri" value="hive2://sca61.local:10000"/>
       <property name="sourceStagingPath" value="/user/falcontest/staging"/>
       <property name="targetStagingPath" value="/user/falcontest/staging"/>
       <property name="targetNN" value="hdfs://sca60.local:8020"/>
       <property name="sourceNN" value="hdfs://sca61.local:8020"/>
       <property name="sourceServicePrincipal" value="hive"/>
       <property name="targetServicePrincipal" value="hive"/>
       <property name="targetMetastoreUri" value="thrift://sca60.local:9083"/>
       <property name="sourceMetastoreUri" value="thrift://sca61.local:9083"/>
       <property name="sourceTable" value="tweetsplaces"/>
       <property name="sourceDatabase" value="falcontest2"/>
       <property name="maxEvents" value="10001"/>
       <property name="replicationMaxMaps" value="10"/>
       <property name="clusterForJobRun" value="sca61hive5"/>
       <property name="clusterForJobRunWriteEP" value="hdfs://sca61.local:8020"/>
       <property name="drJobName" value="FalcontestMirror3"/>
       <property name="drNotificationReceivers" value=""/>
   <workflow name="falcon-dr-hive-workflow" engine="oozie" path="/apps/data-mirroring/workflows/hive-disaster-recovery-workflow.xml" lib=""/>
   <retry policy="periodic" delay="minutes(30)" attempts="3"/>
   <ACL owner="falcontest" group="hdfs" permission="0755"/>

Cloudera Employee

The Falcon issue is here:

The fix will appear in Falcon version 0.10 (now part of HDP 2.5).