- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to use Name Service ID between to Clusters
- Labels:
-
Apache Hadoop
Created ‎01-06-2016 07:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Within a cluster we have no trouble executing commands agains an HA NameNode using the NameServiceID. But it doesn't work when doing discp from one cluster to another because the clusters are unaware of each other's mapping of nodes to NameServiceID. How does one do this?
Created ‎01-06-2016 07:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recommend taking a look at Apache JIRA HDFS-6376. This issue addressed the problem of DistCp across 2 different HA clusters. The solution introduces a new configuration property, dfs.internal.nameservices. This allows you to set up configuration to differentiate between "all known nameservices" and "nameservices that this cluster's DataNodes need to report to."
<property> <name>dfs.internal.nameservices</name> <value></value> <description> Comma-separated list of nameservices that belong to this cluster. Datanode will report to all the nameservices in this list. By default this is set to the value of dfs.nameservices. </description> </property>
HDFS-6376 is included in all versions of both HDP 2.2 and HDP 2.3. It is not included in any release of the HDP 2.1 line.
Created ‎01-06-2016 07:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To do that, one way would be to identify current Active Namenode on both clusters and then use that. Here is an example for distcp problem.
hdfs distcp hdfs://active1:8020/path hdfs://active2:8020/path
Alternatively, configure client with both NameService Ids and make it identify those. Our own @hkropp has blogged this here.
Created ‎01-06-2016 07:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recommend taking a look at Apache JIRA HDFS-6376. This issue addressed the problem of DistCp across 2 different HA clusters. The solution introduces a new configuration property, dfs.internal.nameservices. This allows you to set up configuration to differentiate between "all known nameservices" and "nameservices that this cluster's DataNodes need to report to."
<property> <name>dfs.internal.nameservices</name> <value></value> <description> Comma-separated list of nameservices that belong to this cluster. Datanode will report to all the nameservices in this list. By default this is set to the value of dfs.nameservices. </description> </property>
HDFS-6376 is included in all versions of both HDP 2.2 and HDP 2.3. It is not included in any release of the HDP 2.1 line.
Created ‎04-02-2016 09:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually this does not quite answer the question, but gives a good hint to dfs.internal.nameservices. The parameter is needed to distinguish between local namservice and other nameservices configured, but does not support distcp between two HA clusters. dfs.internal.nameservices for example is relevant for DNs so they don't register with the other cluster. To support distcp between multiple HA clusters you simply have to define multiple nameservices like this for an example:
<configuration> <!-- services --> <property> <name>dfs.nameservices</name> <value>serviceId1,serviceId2</value> </property> <!-- serviceId2 properties --> <property> <name>dfs.client.failover.proxy.provider.nameservices2</name> <value>org.apache.hadoop.hdfs.server .namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.namenodes.serviceId2</name> <value>nn201,nn202</value> </property> <property> <name>dfs.namenode.rpc-address.serviceId2.nn201</name> <value>nn201.pro.net:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.serviceId2.nn201</name> <value>nn201.pro.net:54321</value> </property> <property> <name>dfs.namenode.http-address.serviceId2.nn201</name> <value>nn201.pro.net:50070</value> </property> <property> <name>dfs.namenode.https-address.serviceId2.nn201</name> <value>nn201.prod.com:50470</value> </property> <property> <name>dfs.namenode.rpc-address.serviceId2.nn202</name> <value>nn202.pro.net:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.serviceId2.nn202</name> <value>nn202.pro.net:54321</value> </property> <property> <name>dfs.namenode.http-address.serviceId2.nn202</name> <value>nn202.pro.net:50070</value> </property> <property> <name>dfs.namenode.https-address.serviceId2.nn202</name> <value>nn202.prod.net:50470</value> </property> <!—- serviceId1 --> <property> <name>dfs.client.failover.proxy.provider.nameservices1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha. ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.namenodes.nameservices1</name> <value>nn101,nn102</value> </property> <property> <name>dfs.namenode.rpc-address.serviceId1.nn101</name> <value>nn101.poc.net:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.serviceId1.nn101</name> <value>nn101.poc.net:54321</value> </property> <property> <name>dfs.namenode.http-address.serviceId1.nn101</name> <value>nn101.poc.net:50070</value> </property> <property> <name>dfs.namenode.https-address.serviceId1.nn101</name> <value>nn101.poc.net:50470</value> </property> <property> <name>dfs.namenode.rpc-address.serviceId1.nn102</name> <value>nn102.poc.net:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.serviceId1.nn102</name> <value>nn102.poc.net:54321</value> </property> <property> <name>dfs.namenode.http-address.serviceId1.nn102</name> <value>nn102.poc.net:50070</value> </property> <property> <name>dfs.namenode.https-address.serviceId1.nn102</name> <value>nn102.poc.net:50470</value> </property> </configuration>
Adding this to the hdfs-site config makes both nameservices
serviceId1,serviceId2
available.
Created ‎01-06-2016 10:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is also a command line options "-ns" in distcp that deals with this.
Created ‎06-03-2016 09:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Likes distcp, i have the same issue with FALCON for submitting cluster entity. My falcon server needs to know local clusterID and remote clusterID. And falcon command not has an option for config file hdfs-site.xml.
Created ‎06-28-2017 12:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Configured hdfs-site.xml file both cluster HA node. When I starting start-all.sh it starting other second cluster services after that namenode getting down.
