Created 02-29-2016 11:45 AM
When I am trying to do distcp in High Availability Cluster then it is failing with below error.
[s0998@test ~]$ hadoop distcp hdfs://HDPINFHA/user/s0998/sampleTest.txt hdfs://HDPTSTHA/user/root/ 16/02/29 06:32:38 ERROR tools.DistCp: Invalid arguments: java.lang.IllegalArgumentException: java.net.UnknownHostException: HDPTSTHA at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:406) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:216) at org.apache.hadoop.tools.DistCp.run(DistCp.java:116) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430) Caused by: java.net.UnknownHostException: HDPTSTHA
Though I have configured via below urls.
Created 02-29-2016 11:48 AM
please see this blog and double check your values http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/
Created 02-29-2016 11:48 AM
please see this blog and double check your values http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/
Created 02-29-2016 12:58 PM
@Artem Ervits: When I changed dfs.nameservices to both cluster then I am not able to restart hdfs services.
resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://m1.hdp22:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hdfs'' returned status_code=403. { "RemoteException": { "exception": "StandbyException", "javaClassName": "org.apache.hadoop.ipc.StandbyException", "message": "Operation category READ is not supported in state standby" } }
Created 02-29-2016 01:23 PM
only use the link I provided to double check your values, for all values refer to our docs as you did. Did you read this paragraph clearly from the blog?
"The other alternative is to configure the client with both service ids and make it aware of the way to identify the active NameNode of both clusters. For this you would need to define a custom configuration you are only going to use for distcp. The hdfs client can be configured to point to that config like this"
create a custom xml file and pass it to hadoop disctp command every time you want to distcp. Don't use that config as your global config for hdfs. Revert back the configuration to previous in Ambari and create a custom hdfs-site.xml in your user directory, pass it to hadoop distcp and report results back.
Created 03-01-2016 12:39 PM
@Artem Ervits: I tried with external dir as well but getting below error.
[s0998dnz@lxhdpmastinf001 ~]$ hadoop --config conf/ distcp hdfs://HDPINFHA/user/s0998dnz/sampleTest.txt hdfs://HDPTSTHA/user/root/ 16/03/01 07:40:35 ERROR tools.DistCp: Invalid arguments: java.lang.IllegalArgumentException: java.net.UnknownHostException: HDPTSTHA at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:406) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176
Created 03-02-2016 12:25 AM
@Saurabh Kumar I don't have an HA cluster to test with but testing on Sandbox worked for me with hadoop command, not hdfs. Please double-check your properties. The safest route is to determine the active namenode at the time of copy, I agree it's not the most optimal solution. Tagging experts @stevel @Chris Nauroth
cp /etc/hadoop/conf/hdfs-site.xml distcp.xml mkdir confdir && mv distcp.xml confdir hadoop --config confdir distcp hdfs://sandbox.hortonworks.com:8020/user/root/sample.json hdfs://sandbox.hortonworks.com:8020/user/root/sample.json5
Created 03-02-2016 12:38 AM
"The safest route is to determine the active namenode at the time of copy,"
This would have an unfortunate side effect. Referencing the active NameNode's address directly means that the DistCp job wouldn't be able to survive an HA failover. If there was a failover in the middle of a long-running DistCp job, then you'd likely need to restart it from the beginning.
The HDFS-6376 patch mentioned throughout this question should be sufficient to enable a DistCp across HA clusters, assuming you are running an HDP version that has the patch. The original question includes a link to HDP 2.3 docs. If that is the version you are running, then that's fine, because HDFS-6376 is included in all HDP 2.3 releases. This is tested regularly and confirmed to be working.
If all else fails, then this sounds like a reason to file a support case for additional hands-on troubleshooting with your particular cluster. That might be more effective than trying to resolve it through HCC.
Created 02-29-2016 12:01 PM
Created 02-29-2016 12:12 PM
In order to distcp between two HDFS HA cluster (for example A and B), modify the following in the hdfs-site.xml for both clusters: For example, nameservice for cluster A and B is HAA and HAB respectively. - Add value to the nameservice for both clusters dfs.nameservices = HAA, HAB - Add property dfs.internal.nameservices In cluster A: dfs.internal.nameservices = HAA In cluster B: dfs.internal.nameservices = HAB - Add dfs.ha.namenodes.<nameservice> In cluster A dfs.ha.namenodes.HAB = nn1,nn2 In cluster B dfs.ha.namenodes.HAA = nn1,nn2 - Add property dfs.namenode.rpc-address.<cluster>.<nn> In cluster A dfs.namenode.rpc-address.HAB.nn1 = <NN1_fqdn>:8020 dfs.namenode.rpc-address.HAB.nn2 = <NN2_fqdn>:8020 In cluster B dfs.namenode.rpc-address.HAA.nn1 = <NN1_fqdn>:8020 dfs.namenode.rpc-address.HAA.nn2 = <NN2_fqdn>:8020 - Add property dfs.client.failover.proxy.provider.<cluster - i.e HAA or HAB> In cluster A dfs.client.failover.proxy.provider.HAB = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider In cluster B dfs.client.failover.proxy.provider.HAA = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider - Restart HDFS service. Once complete you will be able to run the distcp command using the nameservice similar to: hadoop distcp hdfs://HDPINFHA/tmp/testDistcp hdfs://HDPTSTHA/tmp/ |
Created 02-29-2016 12:34 PM
I followed the same but still getting same error.