Support Questions
Find answers, ask questions, and share your expertise

HDFS backup using oozie on secure cluster to normal cluster.?

New Contributor

what are the properties need to be ensured in oozie-site.xml for oozie backup between the secured cluster to normal cluster..? is there any specific argument need to be passed between oozie command,for oozie backup..?

4 REPLIES 4

Super Guru

@Mathes krishna - Do you want to schedule distcp in oozie which should copy data from non-secure to secure cluster?

Super Guru
@Mathes krishna

You can use below oozie distcp action

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.4">
    ...
    <action name="[NODE-NAME]">
        <distcp xmlns="uri:oozie:distcp-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode1}</name-node>
            <arg>${nameNode1}/path/to/input.txt</arg>
            <arg>${nameNode2}/path/to/output.txt</arg>
            </distcp>
        <ok to="[NODE-NAME]"/>
        <error to="[NODE-NAME]"/>
    </action>
    ...
</workflow-app>

The first arg indicates the input and the second arg indicates the output. In the above example, the input is on namenode1 and the output is on namenode2 .

IMPORTANT: If using the DistCp action between 2 secure clusters, the following property must be added to the configuration of the action:

<property>
    <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
    <value>${nameNode1},${nameNode2}</value>
</property>

Please check below link, you need to add mentioned property in your workflow.xml

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Sys_Admin_Guides/content/ref-c8ffaa14-ea...

Hope this helps!

Sorry, didn't see that you already replied.

Since the remote cluster is non-secure, I think you only might need the "oozie.launcher.mapreduce.job.hdfs-servers" property. The rest of the workflow is like in the distcp Oozie example below. Just define parameterized values in job.properties. The input and output path in the example look cryptic, you can replace them with actual paths, or include some other parameters that work for you. nameNode1 and 2 look like "hdfs://<NN-FQDN>:8020". BTW, in HDP Oozie examples are located at /usr/hdp/current/oozie-client/doc/oozie-examples.tar.gz

<workflow-app xmlns="uri:oozie:workflow:0.3" name="distcp-wf">
    <start to="distcp-node"/>
    <action name="distcp-node">
        <distcp xmlns="uri:oozie:distcp-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode1}</name-node>
            <prepare>
                <delete path="${nameNode2}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
<property>
  <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
  <value>${nameNode1},${nameNode2}</value>
</property>
            </configuration>
            <arg>${nameNode1}/user/${wf:user()}/${examplesRoot}/input-data/text/data.txt</arg>
            <arg>${nameNode2}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}/data.txt</arg>
            </distcp>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>DistCP failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>