Created 01-06-2016 07:37 PM
Within a cluster we have no trouble executing commands agains an HA NameNode using the NameServiceID. But it doesn't work when doing discp from one cluster to another because the clusters are unaware of each other's mapping of nodes to NameServiceID. How does one do this?
Created 01-06-2016 07:45 PM
I recommend taking a look at Apache JIRA HDFS-6376. This issue addressed the problem of DistCp across 2 different HA clusters. The solution introduces a new configuration property, dfs.internal.nameservices. This allows you to set up configuration to differentiate between "all known nameservices" and "nameservices that this cluster's DataNodes need to report to."
<property> <name>dfs.internal.nameservices</name> <value></value> <description> Comma-separated list of nameservices that belong to this cluster. Datanode will report to all the nameservices in this list. By default this is set to the value of dfs.nameservices. </description> </property>
HDFS-6376 is included in all versions of both HDP 2.2 and HDP 2.3. It is not included in any release of the HDP 2.1 line.
Created 01-06-2016 07:43 PM
To do that, one way would be to identify current Active Namenode on both clusters and then use that. Here is an example for distcp problem.
hdfs distcp hdfs://active1:8020/path hdfs://active2:8020/path
Alternatively, configure client with both NameService Ids and make it identify those. Our own @hkropp has blogged this here.
Created 01-06-2016 07:45 PM
I recommend taking a look at Apache JIRA HDFS-6376. This issue addressed the problem of DistCp across 2 different HA clusters. The solution introduces a new configuration property, dfs.internal.nameservices. This allows you to set up configuration to differentiate between "all known nameservices" and "nameservices that this cluster's DataNodes need to report to."
<property> <name>dfs.internal.nameservices</name> <value></value> <description> Comma-separated list of nameservices that belong to this cluster. Datanode will report to all the nameservices in this list. By default this is set to the value of dfs.nameservices. </description> </property>
HDFS-6376 is included in all versions of both HDP 2.2 and HDP 2.3. It is not included in any release of the HDP 2.1 line.
Created 04-02-2016 09:32 PM
Actually this does not quite answer the question, but gives a good hint to dfs.internal.nameservices. The parameter is needed to distinguish between local namservice and other nameservices configured, but does not support distcp between two HA clusters. dfs.internal.nameservices for example is relevant for DNs so they don't register with the other cluster. To support distcp between multiple HA clusters you simply have to define multiple nameservices like this for an example:
<configuration> <!-- services --> <property> <name>dfs.nameservices</name> <value>serviceId1,serviceId2</value> </property> <!-- serviceId2 properties --> <property> <name>dfs.client.failover.proxy.provider.nameservices2</name> <value>org.apache.hadoop.hdfs.server .namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.namenodes.serviceId2</name> <value>nn201,nn202</value> </property> <property> <name>dfs.namenode.rpc-address.serviceId2.nn201</name> <value>nn201.pro.net:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.serviceId2.nn201</name> <value>nn201.pro.net:54321</value> </property> <property> <name>dfs.namenode.http-address.serviceId2.nn201</name> <value>nn201.pro.net:50070</value> </property> <property> <name>dfs.namenode.https-address.serviceId2.nn201</name> <value>nn201.prod.com:50470</value> </property> <property> <name>dfs.namenode.rpc-address.serviceId2.nn202</name> <value>nn202.pro.net:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.serviceId2.nn202</name> <value>nn202.pro.net:54321</value> </property> <property> <name>dfs.namenode.http-address.serviceId2.nn202</name> <value>nn202.pro.net:50070</value> </property> <property> <name>dfs.namenode.https-address.serviceId2.nn202</name> <value>nn202.prod.net:50470</value> </property> <!—- serviceId1 --> <property> <name>dfs.client.failover.proxy.provider.nameservices1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha. ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.namenodes.nameservices1</name> <value>nn101,nn102</value> </property> <property> <name>dfs.namenode.rpc-address.serviceId1.nn101</name> <value>nn101.poc.net:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.serviceId1.nn101</name> <value>nn101.poc.net:54321</value> </property> <property> <name>dfs.namenode.http-address.serviceId1.nn101</name> <value>nn101.poc.net:50070</value> </property> <property> <name>dfs.namenode.https-address.serviceId1.nn101</name> <value>nn101.poc.net:50470</value> </property> <property> <name>dfs.namenode.rpc-address.serviceId1.nn102</name> <value>nn102.poc.net:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.serviceId1.nn102</name> <value>nn102.poc.net:54321</value> </property> <property> <name>dfs.namenode.http-address.serviceId1.nn102</name> <value>nn102.poc.net:50070</value> </property> <property> <name>dfs.namenode.https-address.serviceId1.nn102</name> <value>nn102.poc.net:50470</value> </property> </configuration>
Adding this to the hdfs-site config makes both nameservices
serviceId1,serviceId2
available.
Created 01-06-2016 10:01 PM
There is also a command line options "-ns" in distcp that deals with this.
Created 06-03-2016 09:44 AM
Likes distcp, i have the same issue with FALCON for submitting cluster entity. My falcon server needs to know local clusterID and remote clusterID. And falcon command not has an option for config file hdfs-site.xml.
Created 06-28-2017 12:29 PM
Configured hdfs-site.xml file both cluster HA node. When I starting start-all.sh it starting other second cluster services after that namenode getting down.