Support Questions

ppruski · ‎06-08-2016

Replication of partitioned tables is not working, the behaviour seen is as follows.

Supposed we perform an operation such as a load table on the source cluster, when we run a cycle of Falcon replication the load is applied to the target cluster. When we run another cycle of replication the load seems to be re-applied to the target cluster for every subsequent replication cycle. This behaviour seems to happen for all operations on partitioned tables including drop partition. In the case of the drop partition the second time the operation is applied this leads to a failure as you are not able to perform the drop the second time. This behaviour leads to poor performance at times and also failures.

Is this behaviour normal? Is it a configuration issue? Any guidance for a workaround is appreciated.

Using Falcon 0.6.1.

ppruski · ‎09-08-2016

The Falcon issue is here: https://issues.apache.org/jira/browse/FALCON-2017

The fix will appear in Falcon version 0.10 (now part of HDP 2.5).

View solution in original post

bvellanki · ‎06-08-2016

@Piotr Pruski

How are you setting up replication? Is it using Feed Replication or is it using Falcon recipe. Can you please do "falcon admin -version" and share the result. Can you also share the feed entity ?

ppruski · ‎06-09-2016

@Balu

Using the Falcon Web UI Mirroring task.

As far as I know Falcon mirroring sat up with the UI is using Falcon recipe.

Falcon server build version: {"properties":[{"key":"Version","value":"0.6.1.2.4.0.0-169-rc644fdced4cb1dc348b9c9c59a9960114d5ed58e"},{"key":"Mode","value":"embedded"},{"key":"Hadoop","value":"2.7.1.2.4.0.0-169-r26104d8ac833884c8776473823007f176854f2eb"},{"key":"authentication","value":"simple"}]}

Mirroring does not use Feed entity, it is using Process entity.

Here is an example of the unsuccessful Hive replication of a partitioned table:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="FalcontestMirror3" xmlns="uri:falcon:process:0.1">
   <tags>_falcon_mirroring_type=HIVE</tags>
   <clusters>
       <cluster name="sca61hive5">
           <validity start="2016-05-18T14:24Z" end="2019-01-31T15:24Z"/>
       </cluster>
   </clusters>
   <parallel>1</parallel>
   <order>LAST_ONLY</order>
   <frequency>minutes(5)</frequency>
   <timezone>GMT+00:00</timezone>
   <properties>
       <property name="oozie.wf.subworkflow.classpath.inheritance" value="true"/>
       <property name="distcpMaxMaps" value="1"/>
       <property name="distcpMapBandwidth" value="100"/>
       <property name="targetCluster" value="sca60hive4"/>
       <property name="sourceCluster" value="sca61hive5"/>
       <property name="targetHiveServer2Uri" value="hive2://sca60.local:10000"/>
       <property name="sourceHiveServer2Uri" value="hive2://sca61.local:10000"/>
       <property name="sourceStagingPath" value="/user/falcontest/staging"/>
       <property name="targetStagingPath" value="/user/falcontest/staging"/>
       <property name="targetNN" value="hdfs://sca60.local:8020"/>
       <property name="sourceNN" value="hdfs://sca61.local:8020"/>
       <property name="sourceServicePrincipal" value="hive"/>
       <property name="targetServicePrincipal" value="hive"/>
       <property name="targetMetastoreUri" value="thrift://sca60.local:9083"/>
       <property name="sourceMetastoreUri" value="thrift://sca61.local:9083"/>
       <property name="sourceTable" value="tweetsplaces"/>
       <property name="sourceDatabase" value="falcontest2"/>
       <property name="maxEvents" value="10001"/>
       <property name="replicationMaxMaps" value="10"/>
       <property name="clusterForJobRun" value="sca61hive5"/>
       <property name="clusterForJobRunWriteEP" value="hdfs://sca61.local:8020"/>
       <property name="drJobName" value="FalcontestMirror3"/>
       <property name="drNotificationReceivers" value="a@b.com"/>
   </properties>
   <workflow name="falcon-dr-hive-workflow" engine="oozie" path="/apps/data-mirroring/workflows/hive-disaster-recovery-workflow.xml" lib=""/>
   <retry policy="periodic" delay="minutes(30)" attempts="3"/>
   <ACL owner="falcontest" group="hdfs" permission="0755"/>
</process>

ppruski · ‎09-08-2016

The Falcon issue is here: https://issues.apache.org/jira/browse/FALCON-2017

The fix will appear in Falcon version 0.10 (now part of HDP 2.5).

Cloudera Community

Support Questions

Falcon: Replication of hive partitioned tables is not working

Adding new columns to an already partitioned Hive ...

Creating HIVE partitioned tables using sqoop

HIVE - Duplicate table and merge partitions from ...

Falcon Hive Integration

HDFS Snapshots Based Replication Using Apache Falc...

[Hive] table partitioned in parquet giving error t...

Hive Temporary Tables.

Delete/update on hadoop partitioned table in Hiv...

Working with Variables in Hive (Hive Shell and Bee...

Hive replication between clusters - Falcon based H...

Support Questions

Falcon: ​Replication of hive partitioned tables is not working

Falcon: Replication of hive partitioned tables is not working