Created on 03-23-2016 06:33 PM - edited 09-16-2022 01:34 AM
Introduction
In many production clusters, the servers (or VMs) have multiple physical Network Interface Cards (NICs) and/or multiple Virtual Network Interfaces (VIFs) and are configured to participate in multiple IP address spaces (“networks”). This may be done for a variety of reasons, including but not limited to the following:
In many of these “multi-homed” clusters it is desirable to configure the various Hadoop-related services to “listen” on all the network interfaces rather than on just one. Clients, on the other hand, must be given a specific target to which they should initiate connections.
Note that NIC Bonding (also known as NIC Teaming or Link Aggregation), where two or more physical NICs are made to look like a single network interface, is not included in this topic. The settings discussed below are usually not applicable to a NIC bonding configuration which handles multiplexing and failover transparently while presenting only one “logical network” to applications. However, there is a hybrid case where multiple physical NICs are bonded for redundancy and bandwidth, then multiple VIFs are layered on top of them for partitioning of data flows. The multi-homing parameters then relate to the VIFs.
The purpose of this document is to gather in one place all the many parameters for all the different Hadoop-related components, related to configuring multi-homing support for services and clients. In general, to support multi-homing on the services you care about, you must set these parameters in those components and any components upon which they depend. Almost all components depend on Hadoop Core, HDFS and Yarn, so these are given first, along with Security related parameters.
This document assumes you have HDP version 2.3 or later. Most but not all of the features are available in 2.1 and 2.2 also.
To understand the semantics of configuration parameters for multi-homing, it is useful to consider a few practical use cases for why it is desirable to have multi-homed servers and how Hadoop can utilize them. Here are four use cases that cover most of the concepts, although this is not an exhaustive list:
The independent variables in each use case are:
<Use cases to be expanded in a later version of this document.>
In each use case, the site architects will determine which network interfaces are available on servers and client hosts, and which interfaces are preferred for service/service and client/service communications of various types. We must then provide configuration information that specifies for the cluster:
Over time, the Hadoop community evolved an understanding of the configuration practices that make these multi-homing use cases manageable:
Prior to 2014 there were a variety of parameters proposed for specific issues related to multi-homing. These were incremental band-aids with inconsistent naming and incomplete semantics. In mid-2014 (for hadoop-2.6.0) the Hadoop Core community resolved to address Multi-Homing completely, with consistent semantics and usages. Other components also evolved multi-homing support, with a variety of parameter name and semantics choices.
In general, for every service one must consider three sets of properties, related to the three key questions at the beginning of this section:
1. “Hostname Properties” relate to either specifying or deriving the server hostname by which clients will find each service, and the appropriate network for accessing it.
2. “Bind Address Properties” provides optional additional addressing information which is used only by the service itself, and only in multi-homed servers. By default, services will bind only to the default network interface [3] on each server. The Bind Address Properties can cause the service to bind to a non-default network interface if desired, or most commonly the bind address can be set to “0.0.0.0”, causing the service to bind toall available network interfaces on its server.
3. “Kerberos Host Properties” provides the information needed to resolve the“_HOST” substitution in each service’s own host-dependent Kerberos principal name. This is separate from the “Hostname Property” because not all services advertise their hostname to clients, and/or may need to use a different means to determine hostname for security purposes.
Different services evolved different parameters, or combinations of parameters, and parameter naming conventions, but all services need to somehow present the above three property sets.
The “listener” port number for each service must also be managed, but this property does not change in single- vs multi-home clusters. So we will not discuss it further here, except to mention that in HDFS and YARN it is conjoined with the “address” parameters (Hostname Property), while in other services such as HBase, it is in separate “port” parameters.
For historical reasons, in HDFS and YARN most of the parameters related to Hostname Properties end in “-address” or “.address”. Indeed, in non-multi-homed clusters it is acceptable and was once common to simply use IP addresses rather than hostname strings. In multi-homed clusters, however, all server addressing must be hostname-based. In HDFS and YARN most of the parameters related to Bind Address Properties end in “-bind-host” or “.bind-host”.
The four Namenode services each have a dfs.namenode.${service}-address parameter, where ${service} is rpc, servicerpc, http, or https. The Namenode server’s hostname should be assigned here, to establish the Hostname Property for each service. This information is provided to clients via shared config files.
For each, there is an optional corresponding dfs.namenode.${service}-bind-host parameter to establish the Bind Address Properties. It may be set to the server’s IP address on a particular interface, or to “0.0.0.0” to cause the Namenode services to listen on all available network interfaces. If unspecified, the default interface will be used. These parameters are documented at: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html
The Datanodes also present various services; however, their Hostname Properties are established via self-name-lookup as follows:
The Datanode registers its hostname with the Namenode, and the Namenode also captures the Datanode’s IP address from which it registers. (This will be an IP address on the network interface the Datanode determined it should use to communicate with the Namenode.) The Namenode then provides both hostname and IP address to clients, for particular Datanodes, whenever a client operation is needed. Which one the client uses is determined by the dfs.client.use.datanode.hostname and dfs.datanode.use.datanode.hostname parameter values; see below.
Datanodes also have a set of dfs.datanode.${service}-address parameters. However, confusingly, these relate only to the Bind Address Property, not the Hostname Property, for Datanodes. They default to “0.0.0.0”, and it is okay to leave them that way unless it is desired to bind to only a single network interface.
Both Namenodes and Datanodes resolve their Kerberos _HOST bindings by using the hadoop.security.dns.interface and hadoop.security.dns.nameserver parameters, rather than their various *-address parameters. See below under “Security Parameters” for documentation.
By defaultHDFS clients connect to Datanodes using an IP address provided by the Namenode. In a multi-homed network, depending on the network configuration, the Datanode IP address known to the Namenode may be unreachable by the clients. The fix is letting clients perform their own DNS resolution of the Datanode hostname. The following setting enables this behavior. The reason this is optional is to allow skipping DNS resolution for clients on networks where the feature is not required (and DNS might not be a reliable/low-latency service).
More rarely, the Namenode-resolved IP address for a Datanode may be unreachable from other Datanodes. The fix is to force Datanodes to perform their own DNS resolution for inter-Datanode connections. The following setting enables this behavior.
Usually only one network is used for external clients to connect to HDFS servers. However, if the client can “see” the HDFS servers on more than one interface, you can control which interface or interfaces are used for HDFS communication by setting this parameter:
<property> <name>dfs.client.local.interfaces</name> <description> A comma separated list of network interface names to use for data transfer between the client and datanodes. When creating a connection to read from or write to a datanode, the client chooses one of the specified interfaces at random and binds its socket to the IP of that interface. Individual names may be specified as either an interface name (eg "eth0"), a subinterface name (eg "eth0:0"), or an IP address (which may be specified using CIDR notation to match a range of IPs). </description> </property>
The yarn and job history “*.address” parameters specify hostname to be used for each service and sub-service. This information is provided to clients via shared config files.
The optional yarn and job history “*.bind-host” parameters control bind address for sets of related sub-services as follows. Setting them to 0.0.0.0 causes the corresponding services to listen on all available network interfaces.
Yarn and job history services resolve their Kerberos _HOST bindings by using the hadoop.security.dns.interface and hadoop.security.dns.nameserver parameters, rather than their various *-address parameters. See below under “Security Parameters” for documentation.
Each secure host needs to be able to perform _HOST resolution within its own parameter values, to correctly manage Kerberos principals. For this it must determine its own name. By default it will use the canonical name, which is associated with its first network interface. If it is desired to use the hostname associated with a different interface, use these two parameters in core-site.xml.
Core Hadoop services perform a self-name-lookup using these parameters. Many other Hadoop-related services define their own means of determining the Kerberos host substitution name.
These two parameters are relatively new, available in Apache Hadoop 2.7.2 and later, and back-ported to HDP-2.2.9 and later, and HDP-2.3.4 and later. In previous versions of HDFS, the dfs.datanode.dns.{interface,nameserver} parameter pair were used.
<property> <name>hadoop.security.dns.interface</name> <description> The name of the Network Interface from which the service should determine its host name for Kerberos login. e.g. “eth2”. In a multi-homed environment, the setting can be used to affect the _HOST substitution in the service Kerberos principal. If this configuration value is not set, the service will use its default hostname as returned by InetAddress.getLocalHost().getCanonicalHostName(). Most clusters will not require this setting. </description> </property> <property> <name>hadoop.security.dns.nameserver</name> <description> The hostname or IP address of the DNS name server which a service Node should use to determine its own hostname for Kerberos Login, via a reverse-DNS query on its own IP address associated with the interface specified in hadoop.security.dns.interface. Requires hadoop.security.dns.interface. Most clusters will not require this setting. </description> </property>
In secure multi-homed environments, the following parameter will need to be set to false (it is true by default) on both cluster servers and clients (see HADOOP-7733), in core-site.xml. If it is not set correctly, the symptom will be inability to submit an application to YARN from an external client (with error “client host not a member of the Hadoop cluster”), or even from an in-cluster client if server failover occurs.
<property> <name>hadoop.security.token.service.use_ip</name> <description> Controls whether tokens always use IP addresses. DNS changes will not be detected if this option is enabled. Existing client connections that break will always reconnect to the IP of the original host. New clients will connect to the host's new IP but fail to locate a token. Disabling this option will allow existing and new clients to detect an IP change and continue to locate the new host's token. Defaults to true, for efficiency in single-homed systems. If setting to false, be sure to change in both clients and servers. Most clusters will not require this setting. </description> </property>
To prevent a security loophole described in http://ietf.org/rfc/rfc1535.txt, the Hadoop Kerberos libraries in Secure environments with hadoop.security.token.service.use_ip set to false will sometimes do a DNS lookup on a special form of the hostname FQDN that has a “.” appended. For instance, instead of using the FQDN “host1.lab.mycompany.com” it will use the form “host1.lab.mycompany.com.”. [sic] This is called the “absolute rooted” form of the hostname, sometimes written FQDN+".". For an explanation of the need for this form, see the above referenced RFC 1535.
If DNS services are set up as required for multi-homed Hadoop clusters, no special additional setup is needed for this concern, because all modern DNS servers already understand the absolute rooted form of hostnames. However, /etc/hosts processing of name lookup does not. If DNS is not available and /etc/hosts is being used in lieu, then the following additional setup is required:
In the /etc/hosts file of all cluster servers and client hosts, for the entries related to all hostnames of cluster servers, add an additional alias consisting of the FQDN+".". For example, a line in /etc/hosts that would normally look like
39.0.64.1 selena1.labs.mycompany.com selena1 hadoopvm8-1
should be extended to look like
39.0.64.1 selena1.labs.mycompany.com selena1.labs.mycompany.com. selena1 hadoopvm8-1
If the absolute rooted hostname is not included in /etc/hosts when required, the symptom will be “Unknown Host” error (see https://wiki.apache.org/hadoop/UnknownHost#UsingHostsFiles).
The documentation of multi-homing related parameters in the HBase docs is largely lacking, and in some cases incorrect. The following is based on the most up-to-date information available as of January 2016.
Credit: Josh Elser worked out the correct information for this document, by studying the source code. Thanks, Josh!
Both HMaster and RegionServers advertise their hostname with the cluster’s Zookeeper (ZK) nodes. These ZK nodes are found using the hbase.zookeeper.quorum parameter value. The ZK server names found here are looked up via DNS on the default interface, by all HBase ZK clients. (Note the DNS server on the default interface may return an address for the ZK node that is on some other network interface.)
The HMaster determines its IP address on the interface specified in the hbase.master.dns.interface parameter, then does a reverse-DNS query on the DNS server at hbase.master.dns.interface. However, it does not use the resulting hostname directly. Instead, it does a forward-DNS lookup of this name on the default (or bind address) network interface, followed by a reverse-DNS lookup of the resulting IP address on the same interface’s DNS server. The resulting hostname is what the HMaster actually advertises with Zookeeper.
Each RegionServer obtains the HMaster’s hostname from Zookeeper, and looks it up with DNS on the default network. It then registers itself with the HMaster. When it does so, the HMaster captures the IP address of the RegionServer, and does a reverse-DNS lookup on the default (or HMaster bind address) interface to obtain a hostname for the RegionServer. It then delivers this hostname back to the RegionServer, which in turn advertises it on Zookeeper.
Regardless of the above determination of hostname, the Bind Address Property may change the IP address binding of each service, to a non-default interface, or to all interfaces (0.0.0.0). The parameters for Bind Address Property are as follows, for each service:
HMaster RPC | hbase.master.ipc.address |
HMaster WebUI | hbase.master.info.bindAddress |
RegionServer RPC | hbase.regionserver.ipc.address |
RegionServer WebUI | hbase.regionserver.info.bindAddress |
REST RPC | hbase.rest.host |
REST WebUI | hbase.rest.info.bindAddress |
Thrift1 RPC | hbase.regionserver.thrift.ipaddress |
Thrift2 RPC | hbase.thrift.info.bindAddress |
Notes:
Finally, services and clients use a dns.{interface,nameserver} parameter pair to determine their Kerberos Host substitution names, as follows. See hadoop.security.dns.interface and hadoop.security.dns.nameserver in the “Security Parameters” section for documentation of the semantics of these parameter pairs.
HMaster RPC | hbase.master.dns.{interface,nameserver} |
HMaster WebUI | N/A |
RegionServer RPC | hbase.regionserver.dns.{interface,nameserver} |
RegionServer WebUI | N/A |
REST RPC | N/A |
REST WebUI | N/A |
Thrift1 RPC | hbase.thrift.dns.{interface,nameserver} |
Thrift2 RPC | hbase.thrift.dns.{interface,nameserver} |
HBase Client | hbase.client.dns.{interface,nameserver} |
Lastly, there is a bit of complexity with ZK node names, if the HBase-embedded ZK quorum is used. When HBase is instantiating an embedded set of ZK nodes, it is necessary to correlate the node hostnames with the integer indices in the “zoo.cfg” configuration. As part of this, ZK has a hbase.zookeeper.dns.{interface,nameserver} parameter pair. Each ZK node uses these in the usual way to determine its own hostname, which is then fed into the embedded ZK configuration process.
Superseded by hadoop.security.dns.interface and hadoop.security.dns.nameserver, in Apache Hadoop 2.7.2 and later (back-ported to HDP-2.2.9 and later, and HDP-2.3.4 and later):
Superceded by Namenode HA:
Superceded hdfs/yarn/mapreduce parameters used by older systems:
Older historical items superceded:
Note
Regarding the blog at: http://hortonworks.com/blog/multihoming-on-hadoop-yarn-clusters/
In “Taking a Test Drive” section of this blog it suggests using EC2 with Public and Private networks. We now know this doesn’t work, since by default in AWS, Public addresses are just NAT’ed to Private. Instead, one must use VPC capability with 2 Private networks. Should fix this document.
[1] Unless you really, really know what you’re doing, and want to work through every little nuance of the semantics of all the parameters. About the only situation in which you might get away with using IP addresses for server addressing is if ALL services are being bound to specific single interfaces, not using 0.0.0.0 at all.
[2] Having one consistent hostname on all interfaces is not a strict technical requirement of Hadoop. But it is a major simplifying assumption, without which the Hadoop administrator is likely to land in a morass of very complex and mutually conflicting parameter choices. For this reason, Hortonworks only supports using one consistent hostname per server on all interfaces used by Hadoop.
[3] In non-virtual Linux systems, the default network interface is typically “eth0” or “eth0:0”, but this conventional naming can be changed in system network configuration or with network virtualization.
[4] Not sure why the resources would be committed for a secondary NN, but not set up HA. However, this is still a supported configuration.
Created on 08-08-2017 07:40 PM
This is a great article.
I had trouble with Ambari server outside the private network (where all the master and data nodes are). There are two approaches to fix this. Ambari server can be brought into the private network (recommended) or the network on which agents are listening to, should be made primary. Basically, the hostname that the ambari server resolves to, should be what the agents are communicating to.
Created on 08-27-2018 08:30 PM
The article doesn't indicate this, so for reference, the listed HDFS settings do not exist by default. These settings, as shown below, need to go into hdfs-site.xml, which is done in Ambari by adding fields under "Custom hdfs-site".
dfs.namenode.rpc-bind-host=0.0.0.0
dfs.namenode.servicerpc-bind-host=0.0.0.0
dfs.namenode.http-bind-host=0.0.0.0
dfs.namenode.https-bind-host=0.0.0.0
Additionally, I found that after making this change, both NameNodes under HA came up as stand-by; the article at https://community.hortonworks.com/articles/2307/adding-a-service-rpc-port-to-an-existing-ha-cluste.h... got me the missing step of running a ZK format.
I have not tested the steps below against a Production cluster and if you foolishly choose to follow these steps, you do so at a very large degree of risk (you could lose all of the data in your cluster). That said, this worked for me in a non-Prod environment:
01) Note the Active NameNode.
02) In Ambari, stop ALL services except for ZooKeeper.
03) In Ambari, make the indicated changes to HDFS.
04) Get to the command line on the Active NameNode (see Step 1 above).
05) At the command line you opened in Step 4, run: `sudo -u hdfs hdfs zkfc -formatZK`
06) Start the JournalNodes.
07) Start the zKFCs.
08) Start the NameNodes, which should come up as Active and Standby. If they don't, you're on your own (see the "high risk" caveat above).
09) Start the DataNodes.
10) Restart / Refresh any remaining HDFS components which have stale configs.
11) Start the remaining cluster services.
It would be great if HWX could vet my procedure and update the article accordingly (hint, hint).