<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Distcp is failing in HA in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162886#M21276</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/2273/saurabhmcakiet.html" nodeid="2273"&gt;@Saurabh Kumar&lt;/A&gt;&lt;P&gt; See this &lt;A href="https://hortonworks.jira.com/browse/BUG-22998" target="_blank"&gt;https://hortonworks.jira.com/browse/BUG-22998&lt;/A&gt;&lt;/P&gt;&lt;P&gt;and &lt;A href="https://issues.apache.org/jira/browse/HDFS-6376" target="_blank"&gt;https://issues.apache.org/jira/browse/HDFS-6376&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 29 Feb 2016 20:01:59 GMT</pubDate>
    <dc:creator>nsabharwal</dc:creator>
    <dc:date>2016-02-29T20:01:59Z</dc:date>
    <item>
      <title>Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162884#M21274</link>
      <description>&lt;P&gt;When I am trying to do distcp in High Availability Cluster then it is failing with below error. &lt;/P&gt;&lt;PRE&gt;[s0998@test ~]$ hadoop distcp  hdfs://HDPINFHA/user/s0998/sampleTest.txt hdfs://HDPTSTHA/user/root/
16/02/29 06:32:38 ERROR tools.DistCp: Invalid arguments: 
java.lang.IllegalArgumentException: &lt;A href="http://java.net"&gt;java.net&lt;/A&gt;.UnknownHostException: HDPTSTHA
	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:406)
	at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311)
	at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
	at org.apache.hadoop.hdfs.DFSClient.&amp;lt;init&amp;gt;(DFSClient.java:678)
	at org.apache.hadoop.hdfs.DFSClient.&amp;lt;init&amp;gt;(DFSClient.java:619)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:216)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:116)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: &lt;A href="http://java.net"&gt;java.net&lt;/A&gt;.UnknownHostException: HDPTSTHA&lt;/PRE&gt;&lt;P&gt;Though I have configured via below urls. &lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0-Win/bk_HDP_RelNotes_Win/content/behav-changes-230_Win.html"&gt;http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0-Win/bk_HDP_RelNotes_Win/content/behav-changes-230_Win.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Feb 2016 19:45:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162884#M21274</guid>
      <dc:creator>SK1</dc:creator>
      <dc:date>2016-02-29T19:45:44Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162885#M21275</link>
      <description>&lt;P&gt;please see this blog and double check your values &lt;A href="http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/" target="_blank"&gt;http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Feb 2016 19:48:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162885#M21275</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-29T19:48:23Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162886#M21276</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/2273/saurabhmcakiet.html" nodeid="2273"&gt;@Saurabh Kumar&lt;/A&gt;&lt;P&gt; See this &lt;A href="https://hortonworks.jira.com/browse/BUG-22998" target="_blank"&gt;https://hortonworks.jira.com/browse/BUG-22998&lt;/A&gt;&lt;/P&gt;&lt;P&gt;and &lt;A href="https://issues.apache.org/jira/browse/HDFS-6376" target="_blank"&gt;https://issues.apache.org/jira/browse/HDFS-6376&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Feb 2016 20:01:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162886#M21276</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-29T20:01:59Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162887#M21277</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/2273/saurabhmcakiet.html" nodeid="2273"&gt;@Saurabh Kumar&lt;/A&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;PRE&gt;In order to distcp between two HDFS HA cluster (for example A and B),  modify the following in the hdfs-site.xml for both clusters:

For example, nameservice for cluster A and B is HAA and HAB respectively.

- Add value to the nameservice for both clusters dfs.nameservices = HAA, HAB

- Add property dfs.internal.nameservices
In cluster A:
dfs.internal.nameservices = HAA
In cluster B:
dfs.internal.nameservices = HAB

- Add dfs.ha.namenodes.&amp;lt;nameservice&amp;gt; 
In cluster A
dfs.ha.namenodes.HAB = nn1,nn2
In cluster B
dfs.ha.namenodes.HAA = nn1,nn2

- Add property dfs.namenode.rpc-address.&amp;lt;cluster&amp;gt;.&amp;lt;nn&amp;gt;
In cluster A
dfs.namenode.rpc-address.HAB.nn1 = &amp;lt;NN1_fqdn&amp;gt;:8020 
dfs.namenode.rpc-address.HAB.nn2 = &amp;lt;NN2_fqdn&amp;gt;:8020
In cluster B
dfs.namenode.rpc-address.HAA.nn1 = &amp;lt;NN1_fqdn&amp;gt;:8020 
dfs.namenode.rpc-address.HAA.nn2 = &amp;lt;NN2_fqdn&amp;gt;:8020

- Add property dfs.client.failover.proxy.provider.&amp;lt;cluster - i.e HAA or HAB&amp;gt;
In cluster A
dfs.client.failover.proxy.provider.HAB = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
In cluster B
dfs.client.failover.proxy.provider.HAA = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

- Restart HDFS service.

Once complete you will be able to run the distcp command using the nameservice similar to:
hadoop distcp hdfs://HDPINFHA/tmp/testDistcp hdfs://HDPTSTHA/tmp/&lt;/PRE&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Mon, 29 Feb 2016 20:12:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162887#M21277</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-29T20:12:54Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162888#M21278</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/140/nsabharwal.html" nodeid="140"&gt;@Neeraj Sabharwal&lt;/A&gt;:&lt;/P&gt;&lt;P&gt;I followed the same but still getting same error. &lt;/P&gt;</description>
      <pubDate>Mon, 29 Feb 2016 20:34:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162888#M21278</guid>
      <dc:creator>SK1</dc:creator>
      <dc:date>2016-02-29T20:34:21Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162889#M21279</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/393/aervits.html" nodeid="393"&gt;@Artem Ervits&lt;/A&gt;: When I changed dfs.nameservices to both cluster then I am not able to restart hdfs services.&lt;/P&gt;&lt;PRE&gt;resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://m1.hdp22:50070/webhdfs/v1/tmp?op=GETFILESTATUS&amp;amp;user.name=hdfs'' returned status_code=403. 
{
  "RemoteException": {
    "exception": "StandbyException", 
    "javaClassName": "org.apache.hadoop.ipc.StandbyException", 
    "message": "Operation category READ is not supported in state standby"
  }
}&lt;/PRE&gt;</description>
      <pubDate>Mon, 29 Feb 2016 20:58:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162889#M21279</guid>
      <dc:creator>SK1</dc:creator>
      <dc:date>2016-02-29T20:58:39Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162890#M21280</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/2273/saurabhmcakiet.html" nodeid="2273"&gt;@Saurabh Kumar&lt;/A&gt;&lt;P&gt;only use the link I provided to double check your values, for all values refer to our docs as you did. Did you read this paragraph clearly from the blog?&lt;/P&gt;&lt;P&gt;&lt;EM&gt;"The other alternative is to configure the client with both service ids and make it aware of the way to identify the active NameNode of both clusters. For this you would need to define a custom configuration you are only going to use for distcp. The hdfs client can be configured to point to that config like this"&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;create a custom xml file and pass it to hadoop disctp command every time you want to distcp. Don't use that config as your global config for hdfs. Revert back the configuration to previous in Ambari and create a custom hdfs-site.xml in your user directory, pass it to hadoop distcp and report results back.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Feb 2016 21:23:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162890#M21280</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-29T21:23:54Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162891#M21281</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2273/saurabhmcakiet.html" nodeid="2273"&gt;@Saurabh Kumar&lt;/A&gt; &lt;/P&gt;&lt;P&gt;You may want to open a support ticket.&lt;/P&gt;&lt;P&gt;Could you recheck all the steps as mentioned in the reply?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2016 19:12:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162891#M21281</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-03-01T19:12:09Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162892#M21282</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/393/aervits.html" nodeid="393"&gt;@Artem Ervits&lt;/A&gt;: I tried with external dir as well but getting below error.&lt;/P&gt;&lt;P&gt;[s0998dnz@lxhdpmastinf001 ~]$ hadoop --config conf/ distcp hdfs://HDPINFHA/user/s0998dnz/sampleTest.txt hdfs://HDPTSTHA/user/root/
16/03/01 07:40:35 ERROR tools.DistCp: Invalid arguments: 
java.lang.IllegalArgumentException: java.net.UnknownHostException: HDPTSTHA
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:406)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2016 20:39:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162892#M21282</guid>
      <dc:creator>SK1</dc:creator>
      <dc:date>2016-03-01T20:39:41Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162893#M21283</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2273/saurabhmcakiet.html" nodeid="2273"&gt;@Saurabh Kumar&lt;/A&gt; I don't have an HA cluster to test with but testing on Sandbox worked for me with hadoop command, not hdfs. Please double-check your properties. The safest route is to determine the active namenode at the time of copy, I agree it's not the most optimal solution. Tagging experts &lt;A rel="user" href="https://community.cloudera.com/users/264/stevel.html" nodeid="264"&gt;@stevel&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/381/cnauroth.html" nodeid="381"&gt;@Chris Nauroth&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;cp /etc/hadoop/conf/hdfs-site.xml distcp.xml
mkdir confdir &amp;amp;&amp;amp; mv distcp.xml confdir
hadoop --config confdir distcp hdfs://sandbox.hortonworks.com:8020/user/root/sample.json hdfs://sandbox.hortonworks.com:8020/user/root/sample.json5
&lt;/PRE&gt;</description>
      <pubDate>Wed, 02 Mar 2016 08:25:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162893#M21283</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-03-02T08:25:03Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162894#M21284</link>
      <description>&lt;P&gt;&lt;EM&gt;"The safest route is to determine the active namenode at the time of copy,"&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;This would have an unfortunate side effect.  Referencing the active NameNode's address directly means that the DistCp job wouldn't be able to survive an HA failover.  If there was a failover in the middle of a long-running DistCp job, then you'd likely need to restart it from the beginning.&lt;/P&gt;&lt;P&gt;The &lt;A href="https://issues.apache.org/jira/browse/HDFS-6376"&gt;HDFS-6376&lt;/A&gt; patch mentioned throughout this question should be sufficient to enable a DistCp across HA clusters, assuming you are running an HDP version that has the patch.  The original question includes a link to HDP 2.3 docs.  If that is the version you are running, then that's fine, because HDFS-6376 is included in all HDP 2.3 releases.  This is tested regularly and confirmed to be working.&lt;/P&gt;&lt;P&gt;If all else fails, then this sounds like a reason to file a support case for additional hands-on troubleshooting with your particular cluster.  That might be more effective than trying to resolve it through HCC.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 08:38:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162894#M21284</guid>
      <dc:creator>cnauroth</dc:creator>
      <dc:date>2016-03-02T08:38:45Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp is failing in HA</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162895#M21285</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/140/nsabharwal.html" nodeid="140"&gt;@Neeraj Sabharwal&lt;/A&gt; : &lt;/P&gt;&lt;P&gt;Thanks for your support, I found a issue actually there was an misconfiguration in hdfs-site.xml file.&lt;/P&gt;&lt;P&gt;I did not add target cluster HA properties to client hdfs-site.xml and because of that it was failing. &lt;/P&gt;&lt;P&gt;but now it is working fine. &lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 22:22:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Distcp-is-failing-in-HA/m-p/162895#M21285</guid>
      <dc:creator>SK1</dc:creator>
      <dc:date>2016-03-02T22:22:13Z</dc:date>
    </item>
  </channel>
</rss>

