Created 12-16-2016 12:53 AM
We have two Clusters, H6 and H7 that have been linked for HA distcp as well as HBase sharing as described in http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/.
When running an hdfs rebalancer command on one of the clusters, it attempts to run the command ON BOTH CLUSTERS.
Here is the configuration, assuming two clusters named h6 and h7.
Component | H6 | H7 |
Name Service | nn-h6 | nn-h7 |
HA Namenode Name | nn-1, nn2 | nn-1, nn2 |
On H6 the following have been added to the hdfs-site.xml file:
dfs.nameservices nn-h6,nn-h7
dfs.ha.namenodes.nn-h7=nn1,nn2
dfs.client.failover.proxy.provider.nn-h7=
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.namenode.http-address.nn-h7.nn1=hdpnn-h7-example.com:50070
dfs.namenode.http-address.nn-h7.nn2=hdpnn-h7-awsw02.example.com:50070
dfs.namenode.https-address.nn-h7.nn1=hdpnn-h7-example.com:50470
dfs.namenode.https-address.nn-h7.nn2=hdpnn-h7-awsw02.example.com:50470
dfs.namenode.rpc-address.nn-h7.nn1=hdpnn-h7-example.com:8020
dfs.namenode.rpc-address.nn-h7.nn2=hdpnn-h7-awsw02.example.com:8020
On H7 the following have been added to the hdfs-site.xml file:
dfs.nameservices nn-h7,nn-h6
dfs.ha.namenodes.nn-h6=nn1,nn2
dfs.client.failover.proxy.provider.nn-h6=
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.namenode.http-address.nn-h6.nn1=hdpnn-h6-example.com:50070
dfs.namenode.http-address.nn-h6.nn2=hdpnn-h6-awsw02.example.com:50070
dfs.namenode.https-address.nn-h6.nn1=hdpnn-h6-example.com:50470
dfs.namenode.https-address.nn-h6.nn2=hdpnn-h6-awsw02.example.com:50470
dfs.namenode.rpc-address.nn-h6.nn1=hdpnn-h6-example.com:8020
dfs.namenode.rpc-address.nn-h6.nn2=hdpnn-h6-awsw02.example.com:8020
Created 12-16-2016 07:56 PM
Hi @jbarnett, this was really interesting to think about. The reason is that your question crosses the domains of three HDFS features: HA, Federation, and Client ViewFS. The key understanding required is that Clusters and Clients can, and sometimes should, be configured differently.
The blog you reference specifically says that the configuration he proposes is "a custom configuration you are only going to use for distcp" (my emphasis added), and he specifically proposes it as a solution for doing distcp between HA clusters, where at any given time different namenodes may be the "active" namenode within each cluster that you're distcp-ing between.[1] He does not mention a critical point, which is that this configuration is for the Client only, not for the HDFS Cluster nodes (and the Client host where this is configured must not be one of the HDFS node servers), if you wish the two clusters to remain non-Federated. The behavior you describe for rebalancing would be the expected result if this configuration were also used for the cluster nodes themselves, causing them to Federate -- and in this case, yes, the undesired result would be balancing across the two clusters regardless of their membership. It is also probable, although I'm not certain, that if you give the `hdfs balancer` command from a Client with this configuration, that it will initiate rebalancing on both clusters; whether each cluster rebalances only within itself or across cluster boundaries I'm not 100% sure, but since the balancing is delegated to the namenodes, I would expect that if the clusters are each using a non-Federated configuration, then they would balance only within themselves.
So the simple answer to your question is: To keep the two clusters separate and non-Federated from each other, the nodes of each cluster should be configured with the hdfs-site.xml file appropriate to its own cluster, and not mentioning the other cluster. Since each cluster is itself HA, it will still need to use the "nameservice suffix" form of many of the configuration parameters, BUT it will only mention the nameservice(s) and namenodes hosted in that cluster; it will not mention nameservice(s)/namenode(s) hosted in the other cluster. (And, btw, the nameservices in the two clusters should have different names -- "fooSite1" and "fooSite2" -- even if they are intended to be mirrors of each other.) Your client hosts should be separate from your cluster nodes, and the two-cluster ViewFS Client configuration you use for invoking `distcp` should be different from the single-cluster ViewFS Client configurations you use for invoking `balancer` on each individual cluster, if you want to be able to trigger rebalancing separately. Make sense?
To get a clear understanding of these features (HA, Federation, and Client ViewFS), and how they relate to each other, I encourage you to read the authoritative sources at:
Hope this helps.
-------------------
[1] He doesn't seem to include the final command which is the whole point of his blog: That after configuring the Client for a ViewFS view of the two clusters, you can now give the `distcp` command in the form:
hdfs distcp viewfs://serviceId1/path viewfs://serviceId2/path
and not have to worry about which namenode is active in each cluster.
Created 12-16-2016 07:56 PM
Hi @jbarnett, this was really interesting to think about. The reason is that your question crosses the domains of three HDFS features: HA, Federation, and Client ViewFS. The key understanding required is that Clusters and Clients can, and sometimes should, be configured differently.
The blog you reference specifically says that the configuration he proposes is "a custom configuration you are only going to use for distcp" (my emphasis added), and he specifically proposes it as a solution for doing distcp between HA clusters, where at any given time different namenodes may be the "active" namenode within each cluster that you're distcp-ing between.[1] He does not mention a critical point, which is that this configuration is for the Client only, not for the HDFS Cluster nodes (and the Client host where this is configured must not be one of the HDFS node servers), if you wish the two clusters to remain non-Federated. The behavior you describe for rebalancing would be the expected result if this configuration were also used for the cluster nodes themselves, causing them to Federate -- and in this case, yes, the undesired result would be balancing across the two clusters regardless of their membership. It is also probable, although I'm not certain, that if you give the `hdfs balancer` command from a Client with this configuration, that it will initiate rebalancing on both clusters; whether each cluster rebalances only within itself or across cluster boundaries I'm not 100% sure, but since the balancing is delegated to the namenodes, I would expect that if the clusters are each using a non-Federated configuration, then they would balance only within themselves.
So the simple answer to your question is: To keep the two clusters separate and non-Federated from each other, the nodes of each cluster should be configured with the hdfs-site.xml file appropriate to its own cluster, and not mentioning the other cluster. Since each cluster is itself HA, it will still need to use the "nameservice suffix" form of many of the configuration parameters, BUT it will only mention the nameservice(s) and namenodes hosted in that cluster; it will not mention nameservice(s)/namenode(s) hosted in the other cluster. (And, btw, the nameservices in the two clusters should have different names -- "fooSite1" and "fooSite2" -- even if they are intended to be mirrors of each other.) Your client hosts should be separate from your cluster nodes, and the two-cluster ViewFS Client configuration you use for invoking `distcp` should be different from the single-cluster ViewFS Client configurations you use for invoking `balancer` on each individual cluster, if you want to be able to trigger rebalancing separately. Make sense?
To get a clear understanding of these features (HA, Federation, and Client ViewFS), and how they relate to each other, I encourage you to read the authoritative sources at:
Hope this helps.
-------------------
[1] He doesn't seem to include the final command which is the whole point of his blog: That after configuring the Client for a ViewFS view of the two clusters, you can now give the `distcp` command in the form:
hdfs distcp viewfs://serviceId1/path viewfs://serviceId2/path
and not have to worry about which namenode is active in each cluster.
Created 12-16-2016 08:16 PM
(Got this wrong the first time! The following should be correct.)
BTW, if your clusters have been inadvertently configured for Federation, and you want to disentangle them, it can be a little tricky. You can't just suddenly change all their configurations, because that would cause the namenodes to lose track of blocks on the remote datanodes. The needed process is too complex to detail here, but basically you need to do something like this:
http://<NameNode_FQDN>:50070
) and navigate to the DataNodes page, then see if datanodes from the OTHER cluster are shown. If so, the two clusters have Federated. At the end of the process, all the nodes in each cluster will have a non-Federated config. This can be done without data loss and only brief NN downtime, but it will require significant time commitment, depending on the cluster sizes.
Please don't try this unless you thoroughly understand the issues involved. If you have any doubts, consulting assistance may be appropriate.
Created 12-18-2016 07:42 AM
Hi @jbarnett,
In order to run HDFS balancer, the new conf dfs.internal.nameservices, which distinguishes internal and remote clusters, needs to be set so that Balancer will use it to locate the local file system.
Alternatively, Balancer and distcp need not share the same conf since distcp may be used for multiple remote clusters. When adding a new remote cluster, we need to add it to the distcp conf. However, it does not make sense to change the Balancer conf. If we are going to use a separated conf for Balancer, we may put only one file system (i.e. the local fs but not the remote fs) in dfs.nameservices .
As a summary, there are two ways to fix the conf.
Hope it helps.