Support Questions

Find answers, ask questions, and share your expertise

With two HA clusters configured for cross-cluster distcp the rebalancer command runs on BOTH clusters

avatar
Expert Contributor

We have two Clusters, H6 and H7 that have been linked for HA distcp as well as HBase sharing as described in http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/.

When running an hdfs rebalancer command on one of the clusters, it attempts to run the command ON BOTH CLUSTERS.

  1. Is this expected behavior?
  2. If this is expected, then will the rebalancer command be run separately, as if we were executing on two separate cluster. Specifically, we want to make sure that it is not going to try to balance ACROSS the clusters and mix up data blocks
  3. Is there any way to prevent this behavior. We are thinking of a custom hdfs-site.xml with only one clusters info in the path when running rebalancer. Thoughts?

Here is the configuration, assuming two clusters named h6 and h7.

ComponentH6H7
Name Servicenn-h6nn-h7
HA Namenode Namenn-1, nn2nn-1, nn2

On H6 the following have been added to the hdfs-site.xml file:

dfs.nameservices nn-h6,nn-h7

dfs.ha.namenodes.nn-h7=nn1,nn2

dfs.client.failover.proxy.provider.nn-h7=

org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

dfs.namenode.http-address.nn-h7.nn1=hdpnn-h7-example.com:50070

dfs.namenode.http-address.nn-h7.nn2=hdpnn-h7-awsw02.example.com:50070

dfs.namenode.https-address.nn-h7.nn1=hdpnn-h7-example.com:50470

dfs.namenode.https-address.nn-h7.nn2=hdpnn-h7-awsw02.example.com:50470

dfs.namenode.rpc-address.nn-h7.nn1=hdpnn-h7-example.com:8020

dfs.namenode.rpc-address.nn-h7.nn2=hdpnn-h7-awsw02.example.com:8020

On H7 the following have been added to the hdfs-site.xml file:

dfs.nameservices nn-h7,nn-h6

dfs.ha.namenodes.nn-h6=nn1,nn2

dfs.client.failover.proxy.provider.nn-h6=

org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

dfs.namenode.http-address.nn-h6.nn1=hdpnn-h6-example.com:50070

dfs.namenode.http-address.nn-h6.nn2=hdpnn-h6-awsw02.example.com:50070

dfs.namenode.https-address.nn-h6.nn1=hdpnn-h6-example.com:50470

dfs.namenode.https-address.nn-h6.nn2=hdpnn-h6-awsw02.example.com:50470

dfs.namenode.rpc-address.nn-h6.nn1=hdpnn-h6-example.com:8020

dfs.namenode.rpc-address.nn-h6.nn2=hdpnn-h6-awsw02.example.com:8020

1 ACCEPTED SOLUTION

avatar

Hi @jbarnett, this was really interesting to think about. The reason is that your question crosses the domains of three HDFS features: HA, Federation, and Client ViewFS. The key understanding required is that Clusters and Clients can, and sometimes should, be configured differently.

The blog you reference specifically says that the configuration he proposes is "a custom configuration you are only going to use for distcp" (my emphasis added), and he specifically proposes it as a solution for doing distcp between HA clusters, where at any given time different namenodes may be the "active" namenode within each cluster that you're distcp-ing between.[1] He does not mention a critical point, which is that this configuration is for the Client only, not for the HDFS Cluster nodes (and the Client host where this is configured must not be one of the HDFS node servers), if you wish the two clusters to remain non-Federated. The behavior you describe for rebalancing would be the expected result if this configuration were also used for the cluster nodes themselves, causing them to Federate -- and in this case, yes, the undesired result would be balancing across the two clusters regardless of their membership. It is also probable, although I'm not certain, that if you give the `hdfs balancer` command from a Client with this configuration, that it will initiate rebalancing on both clusters; whether each cluster rebalances only within itself or across cluster boundaries I'm not 100% sure, but since the balancing is delegated to the namenodes, I would expect that if the clusters are each using a non-Federated configuration, then they would balance only within themselves.

So the simple answer to your question is: To keep the two clusters separate and non-Federated from each other, the nodes of each cluster should be configured with the hdfs-site.xml file appropriate to its own cluster, and not mentioning the other cluster. Since each cluster is itself HA, it will still need to use the "nameservice suffix" form of many of the configuration parameters, BUT it will only mention the nameservice(s) and namenodes hosted in that cluster; it will not mention nameservice(s)/namenode(s) hosted in the other cluster. (And, btw, the nameservices in the two clusters should have different names -- "fooSite1" and "fooSite2" -- even if they are intended to be mirrors of each other.) Your client hosts should be separate from your cluster nodes, and the two-cluster ViewFS Client configuration you use for invoking `distcp` should be different from the single-cluster ViewFS Client configurations you use for invoking `balancer` on each individual cluster, if you want to be able to trigger rebalancing separately. Make sense?

To get a clear understanding of these features (HA, Federation, and Client ViewFS), and how they relate to each other, I encourage you to read the authoritative sources at:

Hope this helps.

-------------------

[1] He doesn't seem to include the final command which is the whole point of his blog: That after configuring the Client for a ViewFS view of the two clusters, you can now give the `distcp` command in the form:

hdfs distcp viewfs://serviceId1/path viewfs://serviceId2/path

and not have to worry about which namenode is active in each cluster.

View solution in original post

3 REPLIES 3

avatar

Hi @jbarnett, this was really interesting to think about. The reason is that your question crosses the domains of three HDFS features: HA, Federation, and Client ViewFS. The key understanding required is that Clusters and Clients can, and sometimes should, be configured differently.

The blog you reference specifically says that the configuration he proposes is "a custom configuration you are only going to use for distcp" (my emphasis added), and he specifically proposes it as a solution for doing distcp between HA clusters, where at any given time different namenodes may be the "active" namenode within each cluster that you're distcp-ing between.[1] He does not mention a critical point, which is that this configuration is for the Client only, not for the HDFS Cluster nodes (and the Client host where this is configured must not be one of the HDFS node servers), if you wish the two clusters to remain non-Federated. The behavior you describe for rebalancing would be the expected result if this configuration were also used for the cluster nodes themselves, causing them to Federate -- and in this case, yes, the undesired result would be balancing across the two clusters regardless of their membership. It is also probable, although I'm not certain, that if you give the `hdfs balancer` command from a Client with this configuration, that it will initiate rebalancing on both clusters; whether each cluster rebalances only within itself or across cluster boundaries I'm not 100% sure, but since the balancing is delegated to the namenodes, I would expect that if the clusters are each using a non-Federated configuration, then they would balance only within themselves.

So the simple answer to your question is: To keep the two clusters separate and non-Federated from each other, the nodes of each cluster should be configured with the hdfs-site.xml file appropriate to its own cluster, and not mentioning the other cluster. Since each cluster is itself HA, it will still need to use the "nameservice suffix" form of many of the configuration parameters, BUT it will only mention the nameservice(s) and namenodes hosted in that cluster; it will not mention nameservice(s)/namenode(s) hosted in the other cluster. (And, btw, the nameservices in the two clusters should have different names -- "fooSite1" and "fooSite2" -- even if they are intended to be mirrors of each other.) Your client hosts should be separate from your cluster nodes, and the two-cluster ViewFS Client configuration you use for invoking `distcp` should be different from the single-cluster ViewFS Client configurations you use for invoking `balancer` on each individual cluster, if you want to be able to trigger rebalancing separately. Make sense?

To get a clear understanding of these features (HA, Federation, and Client ViewFS), and how they relate to each other, I encourage you to read the authoritative sources at:

Hope this helps.

-------------------

[1] He doesn't seem to include the final command which is the whole point of his blog: That after configuring the Client for a ViewFS view of the two clusters, you can now give the `distcp` command in the form:

hdfs distcp viewfs://serviceId1/path viewfs://serviceId2/path

and not have to worry about which namenode is active in each cluster.

avatar

(Got this wrong the first time! The following should be correct.)

BTW, if your clusters have been inadvertently configured for Federation, and you want to disentangle them, it can be a little tricky. You can't just suddenly change all their configurations, because that would cause the namenodes to lose track of blocks on the remote datanodes. The needed process is too complex to detail here, but basically you need to do something like this:

  1. The test for whether this problem exists is to open each NameNode's web UI (http://<NameNode_FQDN>:50070) and navigate to the DataNodes page, then see if datanodes from the OTHER cluster are shown. If so, the two clusters have Federated.
  2. To separate them, first, change the configuration of each Namenode so that it stops serving namespaces from both clusters. I believe it would be best to reconfigure both Namenodes in each HA grouping at the same time, so a brief downtime will be needed.
  3. Then decommission each of the datanodes individually, from all clusters/namenodes, a few at a time (where "few" means a small percentage of the overall cluster size), so they flush their blocks to other nodes. Then, after each datanode is no longer referenced by any namespace, you change ITS configuration to only register to the Namenodes/Nameservices on its own cluster. Restart it, join it to its cluster, and continue with the next datanode. Alternate between the datanodes on the two clusters, so you don't run into storage space issues. Continue until all the datanodes have been so "gently" reconfigured to a single cluster's namespace(s).

At the end of the process, all the nodes in each cluster will have a non-Federated config. This can be done without data loss and only brief NN downtime, but it will require significant time commitment, depending on the cluster sizes.

Please don't try this unless you thoroughly understand the issues involved. If you have any doubts, consulting assistance may be appropriate.

avatar
Rising Star

Hi @jbarnett,

In order to run HDFS balancer, the new conf dfs.internal.nameservices, which distinguishes internal and remote clusters, needs to be set so that Balancer will use it to locate the local file system.

Alternatively, Balancer and distcp need not share the same conf since distcp may be used for multiple remote clusters. When adding a new remote cluster, we need to add it to the distcp conf. However, it does not make sense to change the Balancer conf. If we are going to use a separated conf for Balancer, we may put only one file system (i.e. the local fs but not the remote fs) in dfs.nameservices .

As a summary, there are two ways to fix the conf.

  1. Set all the local and the remote file systems in dfs.nameservices and then set the local file system in dfs.internal.nameservices. The conf will work for both distcp and Balancer.
  2. Set only the local file system in dfs.nameservices in the Balancer conf. Use a different conf for distcp.

Hope it helps.