Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Apache Oozie Distcp Action with Kerberos enabled clusters

avatar
Master Mentor

my environment requires that I pass

-D ipc.client.fallback-to-simple-auth-allowed=true

to distcp command, in distcp 0.2 action specification for Oozie 4.2, I see java-opts option and I can't seem to make workflow run by passing this property. The only way I can imagine it work is if I put the property in core-site.xml which in production clusters is not feasible. My workflow for reference is

<action name="distcp_1">
        <distcp xmlns="uri:oozie:distcp-action:0.2">
            <job-tracker>${resourceManager}</job-tracker>
            <name-node>${nameNode}</name-node>
            <arg>hdfs://aervits-hdp70:8020/tmp/hellounsecure</arg>
            <arg>hdfs://hacluster:8020/user/centos/</arg>
            <java-opts>-Dipc.client.fallback-to-simple-auth-allowed=true</java-opts>
        </distcp>
        <ok to="end"/>
        <error to="kill"/>
    </action> 
1 ACCEPTED SOLUTION

avatar
Master Guru

That property is used when one cluster is kerberized and the other isn't, is this your case? If both clusters are kerberized you need to set "oozie.launcher.mapreduce.job.hdfs-servers" property. Regarding your property can you try to pass it as an arg:

<arg>-Dipc.client.fallback-to-simple-auth-allowed=true</arg>

I don't think the space after "-D" matters, but you can try either way.

View solution in original post

13 REPLIES 13

avatar
Expert Contributor

Can you add a space between -D and ipc.client... This is not a Java system property but should be the -D option to the Tool runner

avatar
Master Mentor

@Venkat Ranganathan can you be more specific, I tried with space character, I'm getting

Error: E0701 : E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'java-opts'. One of '{"uri:oozie:distcp-action:0.2":arg}' is expected.

is this passed as an <arg> or <java-opts>?

<java-opts>-D ipc.client.fallback-to-simple-auth-allowed=true</java-opts>

avatar
Expert Contributor

You should have the java-opts right after configuration element and before arg.

avatar
Master Mentor

@Venkat Ranganathan That didn't work, getting

Error: Could not find or load main class ipc.client.fallback-to-simple-auth-allowed=true

in the stderr log. My workflow as of now looks like so

<action name="distcp_1">
        <distcp xmlns="uri:oozie:distcp-action:0.2">
            <job-tracker>${resourceManager}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <java-opts>-D ipc.client.fallback-to-simple-auth-allowed=true</java-opts>
            <arg>hdfs://aervits-hdp70:8020/tmp/hellounsecure</arg>
            <arg>hdfs://hacluster:8020/user/centos/</arg>
        </distcp>
        <ok to="end"/>
        <error to="kill"/>
    </action>

avatar
Expert Contributor

Let me check. The Java opts might be getting added after the class

avatar
Expert Contributor

You can set up your args like this and remove from java-opts

<arg>-D</arg>
<arg>ipc.client.fallback-to-simple-auth-allowed=true</arg>
<arg>hdfs://aervits-hdp70:8020/tmp/hellounsecure</arg>
<arg>hdfs://hacluster:8020/user/centos/</arg>

Thanks

avatar
Expert Contributor

BTW, this has to come before distcp arguments (like -update etc)

avatar
Master Mentor

@Venkat Ranganathan tried many different options, as part of configuration block as well as a part of submission, i.e.

oozie job -D ipc.client.fallback-to-simple-auth-allowed=true -run

no luck, I read in the Oozie docs that I need property oozie.launcher.mapreduce.job.hdfs-servers and my jobs stopped getting submitted to YARN, hence I commented it out. I also added hadoop.proxy.oozie.hosts and guests to the 2nd cluster as per docs, no luck.

    <action name="distcp_1">
        <distcp xmlns="uri:oozie:distcp-action:0.2">
            <job-tracker>${resourceManager}</job-tracker>
            <name-node>${nameNode}</name-node>
<!--
            <configuration>
                <property>
                    <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
                    <value>${nameNode},${nameNode2}</value>
                </property>
            </configuration>
-->
            <arg>-D</arg>
            <arg>ipc.client.fallback-to-simple-auth-allowed=true</arg>
            <arg>-overwrite</arg>
            <arg>${nameNode}/user/centos/primary</arg>
            <arg>${nameNode2}/tmp/</arg>
        </distcp>
        <ok to="end"/>
        <error to="kill"/>
    </action>

Finally, I stumbled on this note and I'm afraid I hit such case

IMPORTANT: The DistCp action may not work properly with all configurations (secure, insecure) in all versions of Hadoop.

I take instances where one cluster is secured and 2nd cluster is not is not suppored in Distcp action spec 0.2.

https://oozie.apache.org/docs/4.2.0/DG_DistCpActionExtension.html

avatar
Master Guru

That property is used when one cluster is kerberized and the other isn't, is this your case? If both clusters are kerberized you need to set "oozie.launcher.mapreduce.job.hdfs-servers" property. Regarding your property can you try to pass it as an arg:

<arg>-Dipc.client.fallback-to-simple-auth-allowed=true</arg>

I don't think the space after "-D" matters, but you can try either way.