Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Falcon: ​Replication of hive partitioned tables is not working

Solved Go to solution

Falcon: ​Replication of hive partitioned tables is not working

New Contributor

Replication of partitioned tables is not working, the behaviour seen is as follows.

Supposed we perform an operation such as a load table on the source cluster, when we run a cycle of Falcon replication the load is applied to the target cluster. When we run another cycle of replication the load seems to be re-applied to the target cluster for every subsequent replication cycle. This behaviour seems to happen for all operations on partitioned tables including drop partition. In the case of the drop partition the second time the operation is applied this leads to a failure as you are not able to perform the drop the second time. This behaviour leads to poor performance at times and also failures.

Is this behaviour normal? Is it a configuration issue? Any guidance for a workaround is appreciated.

Using Falcon 0.6.1.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Falcon: ​Replication of hive partitioned tables is not working

New Contributor

The Falcon issue is here: https://issues.apache.org/jira/browse/FALCON-2017

The fix will appear in Falcon version 0.10 (now part of HDP 2.5).

3 REPLIES 3

Re: Falcon: ​Replication of hive partitioned tables is not working

Rising Star
@Piotr Pruski

How are you setting up replication? Is it using Feed Replication or is it using Falcon recipe. Can you please do "falcon admin -version" and share the result. Can you also share the feed entity ?

Re: Falcon: ​Replication of hive partitioned tables is not working

New Contributor

@Balu

Using the Falcon Web UI Mirroring task.

As far as I know Falcon mirroring sat up with the UI is using Falcon recipe.

Falcon server build version: {"properties":[{"key":"Version","value":"0.6.1.2.4.0.0-169-rc644fdced4cb1dc348b9c9c59a9960114d5ed58e"},{"key":"Mode","value":"embedded"},{"key":"Hadoop","value":"2.7.1.2.4.0.0-169-r26104d8ac833884c8776473823007f176854f2eb"},{"key":"authentication","value":"simple"}]}

Mirroring does not use Feed entity, it is using Process entity.

Here is an example of the unsuccessful Hive replication of a partitioned table:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="FalcontestMirror3" xmlns="uri:falcon:process:0.1">
   <tags>_falcon_mirroring_type=HIVE</tags>
   <clusters>
       <cluster name="sca61hive5">
           <validity start="2016-05-18T14:24Z" end="2019-01-31T15:24Z"/>
       </cluster>
   </clusters>
   <parallel>1</parallel>
   <order>LAST_ONLY</order>
   <frequency>minutes(5)</frequency>
   <timezone>GMT+00:00</timezone>
   <properties>
       <property name="oozie.wf.subworkflow.classpath.inheritance" value="true"/>
       <property name="distcpMaxMaps" value="1"/>
       <property name="distcpMapBandwidth" value="100"/>
       <property name="targetCluster" value="sca60hive4"/>
       <property name="sourceCluster" value="sca61hive5"/>
       <property name="targetHiveServer2Uri" value="hive2://sca60.local:10000"/>
       <property name="sourceHiveServer2Uri" value="hive2://sca61.local:10000"/>
       <property name="sourceStagingPath" value="/user/falcontest/staging"/>
       <property name="targetStagingPath" value="/user/falcontest/staging"/>
       <property name="targetNN" value="hdfs://sca60.local:8020"/>
       <property name="sourceNN" value="hdfs://sca61.local:8020"/>
       <property name="sourceServicePrincipal" value="hive"/>
       <property name="targetServicePrincipal" value="hive"/>
       <property name="targetMetastoreUri" value="thrift://sca60.local:9083"/>
       <property name="sourceMetastoreUri" value="thrift://sca61.local:9083"/>
       <property name="sourceTable" value="tweetsplaces"/>
       <property name="sourceDatabase" value="falcontest2"/>
       <property name="maxEvents" value="10001"/>
       <property name="replicationMaxMaps" value="10"/>
       <property name="clusterForJobRun" value="sca61hive5"/>
       <property name="clusterForJobRunWriteEP" value="hdfs://sca61.local:8020"/>
       <property name="drJobName" value="FalcontestMirror3"/>
       <property name="drNotificationReceivers" value="a@b.com"/>
   </properties>
   <workflow name="falcon-dr-hive-workflow" engine="oozie" path="/apps/data-mirroring/workflows/hive-disaster-recovery-workflow.xml" lib=""/>
   <retry policy="periodic" delay="minutes(30)" attempts="3"/>
   <ACL owner="falcontest" group="hdfs" permission="0755"/>
</process>
Highlighted

Re: Falcon: ​Replication of hive partitioned tables is not working

New Contributor

The Falcon issue is here: https://issues.apache.org/jira/browse/FALCON-2017

The fix will appear in Falcon version 0.10 (now part of HDP 2.5).