About ppruski

ppruski · ‎09-14-2017

@Eugene Koifman is there any other workaround possible that could cut down the time to go through the procedure of replicating an acid table to a secondary cluster? What is the recommendation for DR on acid tables ?

ppruski · ‎09-08-2016

The Falcon issue is here: https://issues.apache.org/jira/browse/FALCON-2017 The fix will appear in Falcon version 0.10 (now part of HDP 2.5).

ppruski · ‎06-09-2016

@Balu Using the Falcon Web UI Mirroring task. As far as I know Falcon mirroring sat up with the UI is using Falcon recipe. Falcon server build version: {"properties":[{"key":"Version","value":"0.6.1.2.4.0.0-169-rc644fdced4cb1dc348b9c9c59a9960114d5ed58e"},{"key":"Mode","value":"embedded"},{"key":"Hadoop","value":"2.7.1.2.4.0.0-169-r26104d8ac833884c8776473823007f176854f2eb"},{"key":"authentication","value":"simple"}]} Mirroring does not use Feed entity, it is using Process entity. Here is an example of the unsuccessful Hive replication of a partitioned table: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="FalcontestMirror3" xmlns="uri:falcon:process:0.1"> <tags>_falcon_mirroring_type=HIVE</tags> <clusters> <cluster name="sca61hive5"> <validity start="2016-05-18T14:24Z" end="2019-01-31T15:24Z"/> </cluster> </clusters> <parallel>1</parallel> <order>LAST_ONLY</order> <frequency>minutes(5)</frequency> <timezone>GMT+00:00</timezone> <properties> <property name="oozie.wf.subworkflow.classpath.inheritance" value="true"/> <property name="distcpMaxMaps" value="1"/> <property name="distcpMapBandwidth" value="100"/> <property name="targetCluster" value="sca60hive4"/> <property name="sourceCluster" value="sca61hive5"/> <property name="targetHiveServer2Uri" value="hive2://sca60.local:10000"/> <property name="sourceHiveServer2Uri" value="hive2://sca61.local:10000"/> <property name="sourceStagingPath" value="/user/falcontest/staging"/> <property name="targetStagingPath" value="/user/falcontest/staging"/> <property name="targetNN" value="hdfs://sca60.local:8020"/> <property name="sourceNN" value="hdfs://sca61.local:8020"/> <property name="sourceServicePrincipal" value="hive"/> <property name="targetServicePrincipal" value="hive"/> <property name="targetMetastoreUri" value="thrift://sca60.local:9083"/> <property name="sourceMetastoreUri" value="thrift://sca61.local:9083"/> <property name="sourceTable" value="tweetsplaces"/> <property name="sourceDatabase" value="falcontest2"/> <property name="maxEvents" value="10001"/> <property name="replicationMaxMaps" value="10"/> <property name="clusterForJobRun" value="sca61hive5"/> <property name="clusterForJobRunWriteEP" value="hdfs://sca61.local:8020"/> <property name="drJobName" value="FalcontestMirror3"/> <property name="drNotificationReceivers" value="a@b.com"/> </properties> <workflow name="falcon-dr-hive-workflow" engine="oozie" path="/apps/data-mirroring/workflows/hive-disaster-recovery-workflow.xml" lib=""/> <retry policy="periodic" delay="minutes(30)" attempts="3"/> <ACL owner="falcontest" group="hdfs" permission="0755"/> </process>

ppruski · ‎06-08-2016

Replication of partitioned tables is not working, the behaviour seen is as follows. Supposed we perform an operation such as a load table on the source cluster, when we run a cycle of Falcon replication the load is applied to the target cluster. When we run another cycle of replication the load seems to be re-applied to the target cluster for every subsequent replication cycle. This behaviour seems to happen for all operations on partitioned tables including drop partition. In the case of the drop partition the second time the operation is applied this leads to a failure as you are not able to perform the drop the second time. This behaviour leads to poor performance at times and also failures. Is this behaviour normal? Is it a configuration issue? Any guidance for a workaround is appreciated. Using Falcon 0.6.1.

ppruski · ‎05-04-2016

Looking for best practises around DR replication option with Falcon (or Ozzie+distcp..). Using either feed based replication or mirror recipe in Falcon (that both leverage distcp to my understanding), how does it handle the situation where clients are still writing, moving, or deleting in the source cluster? The distcp documentation states if another client is still writing to a source file, the copy will likely fail.. Does Falcon provide any data validation mechanism that the transfer with distcp was successful? What additional benefit would snapshotting have here? (and does Falcon do this?)

ppruski · ‎03-31-2016

Understanding that the 2.4 doesn't appear in the dropdown, are you still able to use an arbitrary number and register the correct repo(s) from here: http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_Installing_HDP_AMB/content/_hdp_24_repositories.html ?

ppruski · ‎12-02-2015

Trying multiple cluster configurations always results in the following error when creating a new cluster: << Failed to create cluster: Unsupported operation: create, on old azure clusters the only supported operation is termination >>. Any idea what is causing the error?

Online	Offline
Last Visited	‎03-08-2021 10:57 AM

Member Since	‎10-01-2015 05:03 PM
Last Visited	‎03-08-2021 10:57 AM
Posts	11
Kudos received	11

Cloudera Community

Re: Falcon: Replication of hive partitioned tabl...

Re: HIVE ACID table - Not enough history available...

Re: Falcon: Replication of hive partitioned tabl...

Re: Falcon: Replication of hive partitioned tabl...

Falcon: Replication of hive partitioned tables i...

DR with Falcon: handling changing data; distcp val...

Re: upgrade HDP 2.4

Cloudbreak-- Failed to create cluster: Unsupported...