Member since
10-01-2015
11
Posts
15
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1818 | 09-08-2016 06:01 PM |
09-14-2017
04:03 AM
@Eugene Koifman is there any other workaround possible that could cut down the time to go through the procedure of replicating an acid table to a secondary cluster? What is the recommendation for DR on acid tables ?
... View more
09-08-2016
06:01 PM
1 Kudo
The Falcon issue is here: https://issues.apache.org/jira/browse/FALCON-2017 The fix will appear in Falcon version 0.10 (now part of HDP 2.5).
... View more
06-09-2016
02:57 AM
@Balu
Using the Falcon Web UI Mirroring task.
As far as I know Falcon mirroring sat up with the UI is using Falcon recipe. Falcon server build version: {"properties":[{"key":"Version","value":"0.6.1.2.4.0.0-169-rc644fdced4cb1dc348b9c9c59a9960114d5ed58e"},{"key":"Mode","value":"embedded"},{"key":"Hadoop","value":"2.7.1.2.4.0.0-169-r26104d8ac833884c8776473823007f176854f2eb"},{"key":"authentication","value":"simple"}]}
Mirroring does not use Feed entity, it is using Process entity. Here is an example of the unsuccessful Hive replication of a partitioned table: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="FalcontestMirror3" xmlns="uri:falcon:process:0.1">
<tags>_falcon_mirroring_type=HIVE</tags>
<clusters>
<cluster name="sca61hive5">
<validity start="2016-05-18T14:24Z" end="2019-01-31T15:24Z"/>
</cluster>
</clusters>
<parallel>1</parallel>
<order>LAST_ONLY</order>
<frequency>minutes(5)</frequency>
<timezone>GMT+00:00</timezone>
<properties>
<property name="oozie.wf.subworkflow.classpath.inheritance" value="true"/>
<property name="distcpMaxMaps" value="1"/>
<property name="distcpMapBandwidth" value="100"/>
<property name="targetCluster" value="sca60hive4"/>
<property name="sourceCluster" value="sca61hive5"/>
<property name="targetHiveServer2Uri" value="hive2://sca60.local:10000"/>
<property name="sourceHiveServer2Uri" value="hive2://sca61.local:10000"/>
<property name="sourceStagingPath" value="/user/falcontest/staging"/>
<property name="targetStagingPath" value="/user/falcontest/staging"/>
<property name="targetNN" value="hdfs://sca60.local:8020"/>
<property name="sourceNN" value="hdfs://sca61.local:8020"/>
<property name="sourceServicePrincipal" value="hive"/>
<property name="targetServicePrincipal" value="hive"/>
<property name="targetMetastoreUri" value="thrift://sca60.local:9083"/>
<property name="sourceMetastoreUri" value="thrift://sca61.local:9083"/>
<property name="sourceTable" value="tweetsplaces"/>
<property name="sourceDatabase" value="falcontest2"/>
<property name="maxEvents" value="10001"/>
<property name="replicationMaxMaps" value="10"/>
<property name="clusterForJobRun" value="sca61hive5"/>
<property name="clusterForJobRunWriteEP" value="hdfs://sca61.local:8020"/>
<property name="drJobName" value="FalcontestMirror3"/>
<property name="drNotificationReceivers" value="a@b.com"/>
</properties>
<workflow name="falcon-dr-hive-workflow" engine="oozie" path="/apps/data-mirroring/workflows/hive-disaster-recovery-workflow.xml" lib=""/>
<retry policy="periodic" delay="minutes(30)" attempts="3"/>
<ACL owner="falcontest" group="hdfs" permission="0755"/>
</process>
... View more
06-08-2016
01:41 PM
2 Kudos
Replication of partitioned tables is not working, the behaviour seen is as follows. Supposed we perform an operation such as a load table on the source cluster, when we run a cycle of Falcon replication the load is applied to the target cluster. When we run another cycle of replication the load seems to be re-applied to the target cluster for every subsequent replication cycle. This behaviour seems to happen for all operations on partitioned tables including drop partition. In the case of the drop partition the second time the operation is applied this leads to a failure as you are not able to perform the drop the second time. This behaviour leads to poor performance at times and also failures. Is this behaviour normal? Is it a configuration issue? Any guidance for a workaround is appreciated. Using Falcon 0.6.1.
... View more
Labels:
05-04-2016
03:35 PM
6 Kudos
Looking for best practises around DR replication option with Falcon (or Ozzie+distcp..). Using either feed based replication or mirror recipe in Falcon (that both leverage distcp to my understanding), how does it handle the situation where clients are still writing, moving, or deleting in the source cluster? The distcp documentation states if another client is still writing to a source file, the copy will likely fail.. Does Falcon provide any data validation mechanism that the transfer with distcp was successful? What additional benefit would snapshotting have here? (and does Falcon do this?)
... View more
Labels:
- Labels:
-
Apache Falcon
-
Apache Hadoop
-
Apache Oozie
03-31-2016
05:29 PM
1 Kudo
Understanding that the 2.4 doesn't appear in the dropdown, are you still able to use an arbitrary number and register the correct repo(s) from here: http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_Installing_HDP_AMB/content/_hdp_24_repositories.html ?
... View more
12-02-2015
03:42 PM
1 Kudo
Trying multiple cluster configurations always results in the following error when creating a new cluster: << Failed to create cluster: Unsupported operation: create, on old azure clusters the only supported operation is termination >>. Any idea what is causing the error?
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak