Created 09-01-2016 06:19 PM
Hello,
I'm working with Falcon using the built-in HDFS mirroring capabilities and would like to enable two distcp options in the workflow XML: the -atomic flag and -strategy flags. Below is my Oozie workflow with these two options commented out, as this approach was unsuccessful. Is there a way to pass these in using a -D option or would the FeedReplicator class need to be modified for this functionality?
<workflow-app xmlns='uri:oozie:workflow:0.3' name='falcon-dr-fs-workflow'>
    <start to='dr-replication'/>
    <!-- Replication action -->
    <action name="dr-replication">
        <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property> <!-- hadoop 2 parameter -->
                    <name>oozie.launcher.mapreduce.job.user.classpath.first</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>oozie.launcher.mapred.job.priority</name>
                    <value>${jobPriority}</value>
                </property>
                <property>
                    <name>oozie.use.system.libpath</name>
                    <value>true</value>
                </property>
                <property>
                    <name>oozie.action.sharelib.for.java</name>
                    <value>distcp</value>
                </property>
                <property>
                    <name>oozie.launcher.oozie.libpath</name>
                    <value>${wf:conf("falcon.libpath")}</value>
                </property>
                <property>
                    <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
                    <value>${drSourceClusterFS},${drTargetClusterFS}</value>
                </property>
            </configuration>
            <main-class>org.apache.falcon.replication.FeedReplicator</main-class>
            <arg>-Dmapred.job.queue.name=${queueName}</arg>
            <arg>-Dmapred.job.priority=${jobPriority}</arg>
            <!--arg>-atomic</arg>
            <arg>-strategy</arg>
            <arg>dynamic</arg-->
            <arg>-maxMaps</arg>
            <arg>${distcpMaxMaps}</arg>
            <arg>-mapBandwidth</arg>
            <arg>${distcpMapBandwidth}</arg>
            <arg>-sourcePaths</arg>
            <arg>${drSourceDir}</arg>
            <arg>-targetPath</arg>
            <arg>${drTargetClusterFS}${drTargetDir}</arg>
            <arg>-falconFeedStorageType</arg>
            <arg>FILESYSTEM</arg>
            <arg>-availabilityFlag</arg>
            <arg>${availabilityFlag == 'NA' ? "NA" : availabilityFlag}</arg>
            <arg>-counterLogDir</arg>
            <arg>${logDir}/job-${nominalTime}/${srcClusterName == 'NA' ? '' : srcClusterName}</arg>
        </java>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>
            Workflow action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
        </message>
    </kill>
    <end name="end"/>
</workflow-app>
					
				
			
			
				
			
			
			
			
			
			
			
		Created 09-01-2016 06:55 PM
@Kyle Dunn: Falcon doesn't support those DistCP options and yes that would require a code change.
Created 09-01-2016 06:55 PM
@Kyle Dunn: Falcon doesn't support those DistCP options and yes that would require a code change.
Created 09-01-2016 06:55 PM
Would you be able to provide an example of what this code change might be similar to in the existing FeedReplicator code?
Created 09-01-2016 11:16 PM
Alternatively, what are the limitations of out-of-stack version support for Falcon? The snapshot-based replication in Falcon 0.10 provides the ultimate functionality I'm looking for, but am currently running on HDP 2.3 / 2.4.
Created 09-02-2016 05:17 AM
What limitations are we talking about here? Sorry, I don't understand your question.
If you are asking about DIstCP options supported in HDFS Mirroirng, currently below options are supported
Below additional options can be supported by using workaround given below:
Please modify the WF hdfs-replication-workflow.xml as below. After distcpMapBandwidth add below content
<arg>-overwrite </arg>
<arg>${overwrite}</arg>
<arg>-ignoreErrors </arg>
<arg>${ignoreErrors}</arg>
<arg>-skipChecksum </arg>
<arg>${skipChecksum}</arg>
<arg>-removeDeletedFiles </arg>
<arg>${removeDeletedFiles}</arg>
<arg>-preserveBlockSize </arg>
<arg>${preserveBlockSize}</arg>
<arg>-preserveReplicationNumber </arg>
<arg>${preserveReplicationNumber}</arg>
<arg>-preservePermission </arg>
<arg>${preservePermission}</arg>
Pass below options in hdfs-replication.properties
overwrite=false ignoreErrors=false skipChecksum=false removeDeletedFiles=true preserveBlockSize=true preserveReplicationNumber=true preservePermission=true
These will work OOTB as FeedReplicator already has support for this and hence no code change is required. Thanks!
Created 11-14-2017 07:57 AM
Hi!
After change parameter preserveBlockSize & skipChecksum on target site,  do not see any change in xml file for  task (after recreate task) :
[hdfs@target ~]$ hdfs dfs -ls /apps/falcon/extensions/hdfs-mirroring/retargets/runtime/hdfs-mirroring-workflow.xml
-rwxr-xr-x   2 hdfs users       4943 2017-11-13 22:39 /apps/falcon/extensions/hdfs-mirroring/retargets/runtime/hdfs-mirroring-workflow.xml        <<<  change  this file ( on target size)
[hdfs@target ~]$
t1.xml[hdfs@target ~]$ grep -i preserveBlockSize hdfs-mirroring-workflow.xml
            <arg>-preserveBlockSize</arg>
            <arg>${preserveBlockSize}</arg>
            <arg>-preserveBlockSize</arg> <arg>true</arg>
[hdfs@target ~]$
[hdfs@target ~]$
[hdfs@target ~]$ grep -i skipChecksum hdfs-mirroring-workflow.xml
            <arg>-skipChecksum</arg>
            <arg>${skipChecksum}</arg>
            <arg>-skipChecksum</arg> <arg>true</arg>
[hdfs@target ~]$
Please help me.
Where I can find file hdfs-replication.properties ?
					
				
			
			
				
			
			
			
			
			
			
			
		 
					
				
				
			
		
