<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: DistCp over Oozie .vs. from shell in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/89105#M35149</link>
    <description>&lt;P&gt;Harsh J:&amp;nbsp; Thanks for the help on the previous issue.&amp;nbsp; We finally resolved the issue.&amp;nbsp; It was due to an undocumented port required in the CDH 6.2 to CDH 6.2 distcp.&amp;nbsp; Now, we are migrating the task over to Oozie and having some trouble.&amp;nbsp; Could you elaborate a bit more or give us some links or pointers?&amp;nbsp; Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We could not find "mapreduce.job.hdfs-servers" . Where is that?&lt;/P&gt;</description>
    <pubDate>Sat, 13 Apr 2019 08:40:43 GMT</pubDate>
    <dc:creator>HenryPark</dc:creator>
    <dc:date>2019-04-13T08:40:43Z</dc:date>
    <item>
      <title>DistCp over Oozie .vs. from shell</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88513#M35144</link>
      <description>&lt;P&gt;Hi.&amp;nbsp; We have a client who has 2 clusters.&amp;nbsp; On the security cluster, they have sensitive data that they redact and copy to the analysis cluster.&amp;nbsp; For security reasons, they would like to minimize the number of open ports on the security cluster.&amp;nbsp; We have successfully tested using distcp from the shell to copy the data with port 8020 open.&amp;nbsp; They would now like to automate the process through oozie.&amp;nbsp; In testing, we have run into an error that port 8042 (Node Manager External Port) is not open.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We do not understand why distcp works fine without port 8042 available when run through the shell but fails when called through Oozie.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any help would be appreciated.&amp;nbsp; Thanks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Henry&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:16:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88513#M35144</guid>
      <dc:creator>HenryPark</dc:creator>
      <dc:date>2022-09-16T14:16:38Z</dc:date>
    </item>
    <item>
      <title>Re: DistCp over Oozie .vs. from shell</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88599#M35145</link>
      <description>Could you share the full log from this failure, both from the Oozie server&lt;BR /&gt;for the action ID and the action launcher job map task logs?&lt;BR /&gt;&lt;BR /&gt;The 8042 port is the NodeManager HTTP port, useful in serving logs of live&lt;BR /&gt;containers among other status details over REST. It is not directly used by&lt;BR /&gt;DistCp in its functions, but MapReduce and Oozie diagnostics might be&lt;BR /&gt;invoking it as part of a response to a failure, so it is a secondary&lt;BR /&gt;symptom.&lt;BR /&gt;&lt;BR /&gt;Note though that running DistCp via Oozie requires you to provide&lt;BR /&gt;appropriate configs that ensures delegation tokens for both kerberized&lt;BR /&gt;clusters are acquired. Use "mapreduce.job.hdfs-servers" with a value such&lt;BR /&gt;as "hdfs://namenode-cluster-1,hdfs://namenode-cluster-2" to influence this&lt;BR /&gt;on the Oozie server's delegation token acquisition phase. This is only&lt;BR /&gt;relevant if you use Kerberos on both clusters.&lt;BR /&gt;</description>
      <pubDate>Tue, 02 Apr 2019 01:55:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88599#M35145</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2019-04-02T01:55:51Z</dc:date>
    </item>
    <item>
      <title>Re: DistCp over Oozie .vs. from shell</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88620#M35146</link>
      <description>&lt;P&gt;Thanks for your reply.&amp;nbsp; Could we ask a related question?&amp;nbsp; Our client is very reluctant to open the ports on these 2 clusters.&amp;nbsp; Could you tell us what ports need to be open for distcp to function properly?&amp;nbsp; After many fails, our client has briefly allowed all the ports to be open.&amp;nbsp; With that change, distcp if working properly.&amp;nbsp; We have already looked at the ports specified in&amp;nbsp;&lt;A href="https://www.cloudera.com/documentation/enterprise/latest/topics/install_ports_distcp.html#topic_9_1" target="_blank"&gt;https://www.cloudera.com/documentation/enterprise/latest/topics/install_ports_distcp.html#topic_9_1&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are there any hidden ports or secondary ports beyond the above documentation that could be causing the problem?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Apr 2019 13:24:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88620#M35146</guid>
      <dc:creator>HenryPark</dc:creator>
      <dc:date>2019-04-02T13:24:02Z</dc:date>
    </item>
    <item>
      <title>Re: DistCp over Oozie .vs. from shell</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88678#M35147</link>
      <description>Is the job submitted to the source cluster, or the destination? The DistCp&lt;BR /&gt;jobs should only need to contact the NodeManagers of the cluster it runs&lt;BR /&gt;on, but if the submitted cluster is remote then the ports may need to be&lt;BR /&gt;opened.&lt;BR /&gt;&lt;BR /&gt;The HDFS transfer part does not involve YARN service communication at all,&lt;BR /&gt;so it is not expected to contact a NodeManager.&lt;BR /&gt;&lt;BR /&gt;It would be helpful if you can share some more logs leading up to the&lt;BR /&gt;observed failure.&lt;BR /&gt;</description>
      <pubDate>Thu, 04 Apr 2019 01:53:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88678#M35147</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2019-04-04T01:53:51Z</dc:date>
    </item>
    <item>
      <title>Re: DistCp over Oozie .vs. from shell</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88696#M35148</link>
      <description>Thank you.&amp;nbsp; Because of the client’s security measures, we are unable to disperse the log files generated.&amp;nbsp; This, of course, makes everything so much more difficult.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 04 Apr 2019 11:58:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/88696#M35148</guid>
      <dc:creator>HenryPark</dc:creator>
      <dc:date>2019-04-04T11:58:51Z</dc:date>
    </item>
    <item>
      <title>Re: DistCp over Oozie .vs. from shell</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/89105#M35149</link>
      <description>&lt;P&gt;Harsh J:&amp;nbsp; Thanks for the help on the previous issue.&amp;nbsp; We finally resolved the issue.&amp;nbsp; It was due to an undocumented port required in the CDH 6.2 to CDH 6.2 distcp.&amp;nbsp; Now, we are migrating the task over to Oozie and having some trouble.&amp;nbsp; Could you elaborate a bit more or give us some links or pointers?&amp;nbsp; Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We could not find "mapreduce.job.hdfs-servers" . Where is that?&lt;/P&gt;</description>
      <pubDate>Sat, 13 Apr 2019 08:40:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DistCp-over-Oozie-vs-from-shell/m-p/89105#M35149</guid>
      <dc:creator>HenryPark</dc:creator>
      <dc:date>2019-04-13T08:40:43Z</dc:date>
    </item>
  </channel>
</rss>

