Support Questions

Find answers, ask questions, and share your expertise

Falcon: ​Replication of hive partitioned tables is not working

avatar
Contributor

Replication of partitioned tables is not working, the behaviour seen is as follows.

Supposed we perform an operation such as a load table on the source cluster, when we run a cycle of Falcon replication the load is applied to the target cluster. When we run another cycle of replication the load seems to be re-applied to the target cluster for every subsequent replication cycle. This behaviour seems to happen for all operations on partitioned tables including drop partition. In the case of the drop partition the second time the operation is applied this leads to a failure as you are not able to perform the drop the second time. This behaviour leads to poor performance at times and also failures.

Is this behaviour normal? Is it a configuration issue? Any guidance for a workaround is appreciated.

Using Falcon 0.6.1.

1 ACCEPTED SOLUTION

avatar
Contributor

The Falcon issue is here: https://issues.apache.org/jira/browse/FALCON-2017

The fix will appear in Falcon version 0.10 (now part of HDP 2.5).

View solution in original post

3 REPLIES 3

avatar
Expert Contributor
@Piotr Pruski

How are you setting up replication? Is it using Feed Replication or is it using Falcon recipe. Can you please do "falcon admin -version" and share the result. Can you also share the feed entity ?

avatar
Contributor

@Balu

Using the Falcon Web UI Mirroring task.

As far as I know Falcon mirroring sat up with the UI is using Falcon recipe.

Falcon server build version: {"properties":[{"key":"Version","value":"0.6.1.2.4.0.0-169-rc644fdced4cb1dc348b9c9c59a9960114d5ed58e"},{"key":"Mode","value":"embedded"},{"key":"Hadoop","value":"2.7.1.2.4.0.0-169-r26104d8ac833884c8776473823007f176854f2eb"},{"key":"authentication","value":"simple"}]}

Mirroring does not use Feed entity, it is using Process entity.

Here is an example of the unsuccessful Hive replication of a partitioned table:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="FalcontestMirror3" xmlns="uri:falcon:process:0.1">
   <tags>_falcon_mirroring_type=HIVE</tags>
   <clusters>
       <cluster name="sca61hive5">
           <validity start="2016-05-18T14:24Z" end="2019-01-31T15:24Z"/>
       </cluster>
   </clusters>
   <parallel>1</parallel>
   <order>LAST_ONLY</order>
   <frequency>minutes(5)</frequency>
   <timezone>GMT+00:00</timezone>
   <properties>
       <property name="oozie.wf.subworkflow.classpath.inheritance" value="true"/>
       <property name="distcpMaxMaps" value="1"/>
       <property name="distcpMapBandwidth" value="100"/>
       <property name="targetCluster" value="sca60hive4"/>
       <property name="sourceCluster" value="sca61hive5"/>
       <property name="targetHiveServer2Uri" value="hive2://sca60.local:10000"/>
       <property name="sourceHiveServer2Uri" value="hive2://sca61.local:10000"/>
       <property name="sourceStagingPath" value="/user/falcontest/staging"/>
       <property name="targetStagingPath" value="/user/falcontest/staging"/>
       <property name="targetNN" value="hdfs://sca60.local:8020"/>
       <property name="sourceNN" value="hdfs://sca61.local:8020"/>
       <property name="sourceServicePrincipal" value="hive"/>
       <property name="targetServicePrincipal" value="hive"/>
       <property name="targetMetastoreUri" value="thrift://sca60.local:9083"/>
       <property name="sourceMetastoreUri" value="thrift://sca61.local:9083"/>
       <property name="sourceTable" value="tweetsplaces"/>
       <property name="sourceDatabase" value="falcontest2"/>
       <property name="maxEvents" value="10001"/>
       <property name="replicationMaxMaps" value="10"/>
       <property name="clusterForJobRun" value="sca61hive5"/>
       <property name="clusterForJobRunWriteEP" value="hdfs://sca61.local:8020"/>
       <property name="drJobName" value="FalcontestMirror3"/>
       <property name="drNotificationReceivers" value="a@b.com"/>
   </properties>
   <workflow name="falcon-dr-hive-workflow" engine="oozie" path="/apps/data-mirroring/workflows/hive-disaster-recovery-workflow.xml" lib=""/>
   <retry policy="periodic" delay="minutes(30)" attempts="3"/>
   <ACL owner="falcontest" group="hdfs" permission="0755"/>
</process>

avatar
Contributor

The Falcon issue is here: https://issues.apache.org/jira/browse/FALCON-2017

The fix will appear in Falcon version 0.10 (now part of HDP 2.5).