Member since
10-22-2015
83
Posts
84
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1441 | 01-31-2018 07:47 PM | |
3593 | 02-27-2017 07:26 PM | |
2859 | 12-16-2016 07:56 PM | |
9444 | 12-14-2016 07:26 PM | |
4254 | 12-13-2016 06:39 PM |
04-05-2016
06:46 PM
@Sunile Manjee Can you please clarify what you mean by "full restore" in this case? A truly full restore of HDFS would restore to a cross-cluster consistent Point-in-Time (PiT) snapshot for all datanodes AND namenode. Therefore the namenode's idea of existing blocks would match the blocks existing on the datanodes after the restore. If the PiT snapshot was not perfect (for instance, if some HDFS files were in the midst of being written during the process of taking the snapshot), then you might need to do some cleanup of the blocks for those affected files. But not via a reformat. In general, a reformat of HDFS erases everything in it, just as with reformat of any file system. So it seems unlikely this is what you want. If you really do want a reformat, then there is no need to do a restore first; it is all going away anyway. How was your snapshot taken, and how are you doing the full restore?
... View more
03-30-2016
04:48 PM
Hi @Sotiris Madamas , if Mats's answer met your needs, please mark it accepted.
... View more
03-23-2016
06:33 PM
9 Kudos
Introduction Motivation
In many production clusters, the servers (or VMs) have multiple physical Network Interface Cards (NICs) and/or multiple Virtual Network Interfaces (VIFs) and are configured to participate in multiple IP address spaces (“networks”). This may be done for a variety of reasons, including but not limited to the following:
to improve availability or bandwidth through redundancy
to partition traffic for security, bandwidth, or other management reasons
etc.
In many of these “multi-homed” clusters it is desirable to configure the various Hadoop-related services to “listen” on all the network interfaces rather than on just one. Clients, on the other hand, must be given a specific target to which they should initiate connections.
Note that NIC Bonding (also known as NIC Teaming or Link Aggregation), where two or more physical NICs are made to look like a single network interface, is not included in this topic. The settings discussed below are usually not applicable to a NIC bonding configuration which handles multiplexing and failover transparently while presenting only one “logical network” to applications. However, there is a hybrid case where multiple physical NICs are bonded for redundancy and bandwidth, then multiple VIFs are layered on top of them for partitioning of data flows. The multi-homing parameters then relate to the VIFs. This Document
The purpose of this document is to gather in one place all the many parameters for all the different Hadoop-related components, related to configuring multi-homing support for services and clients. In general, to support multi-homing on the services you care about, you must set these parameters in those components and any components upon which they depend. Almost all components depend on Hadoop Core, HDFS and Yarn, so these are given first, along with Security related parameters.
This document assumes you have HDP version 2.3 or later. Most but not all of the features are available in 2.1 and 2.2 also. Vocabulary
Server and service are carefully distinguished in this document. Servers are host machines or VMs. Services are Hadoop component processes running on servers.
Client: a process that initiates communication with a service. Client processes may be on a host outside of the Hadoop cluster, or on a server that is part of the Hadoop cluster. Sometimes Hadoop services are clients of other Hadoop services.
DNS and rDNS lookup: “Distributed Name Service” (alt. “Domain Name Service”) - The service that converts hostnames to IP addresses for clients wishing to communicate with said hostname; this is often referred to as “DNS lookup”. If properly configured, the service can also provide “reverse-DNS” (rDNS) lookup, which converts IP addresses back into hostnames. It is common for different networks to have different DNS Servers, and the same DNS or rDNS query to different name servers may get different responses. This is used for Hadoop Multi-Homing to provide a level of indirection associating different IP addresses assigned to a multi-homed server for clients on different networks.
Round-trip-able DNS: Within a single network’s DNS service, if one starts with a valid IP address and performs an rDNS lookup to get a hostname, then DNS lookup on that hostname should return the original IP address; or alternatively, if one starts with a valid hostname and performs a DNS lookup on it to get an IP address, then rDNS lookup on that IP address should return the original hostname.
FQDN: “Fully Qualified Domain Name” - the long form of hostnames, of the form “shortname.domainname”, eg, “host234.lab1.hortonworks.com”. This contrasts with the “short form” hostname, which in this example would be just “host234”. Use Cases
To understand the semantics of configuration parameters for multi-homing, it is useful to consider a few practical use cases for why it is desirable to have multi-homed servers and how Hadoop can utilize them. Here are four use cases that cover most of the concepts, although this is not an exhaustive list:
intra-cluster network separate from external client-server network
dedicated high-bandwidth network separate from general-use network
dedicated high-security or Hadoop administrator network separate from medium-security general-use network
sysop/non-Hadoop network separate from Hadoop network(s)
The independent variables in each use case are:
which network interfaces are available on servers and client hosts, within the cluster and external to the cluster; and
which interfaces are preferred for service/service and client/service communications for each service.
<Use cases to be expanded in a later version of this document.> Theory of Operations
In each use case, the site architects will determine which network interfaces are available on servers and client hosts, and which interfaces are preferred for service/service and client/service communications of various types. We must then provide configuration information that specifies for the cluster:
How clients will find the server hostname of each service
Whether each service will bind to one or all network interfaces on its server
How services will self-identify their server to Kerberos
Over time, the Hadoop community evolved an understanding of the configuration practices that make these multi-homing use cases manageable:
Configuration files should be shared among clients and services, as much as possible. This is highly desirable to minimize management overhead, and the desire to share common config files has guided a lot of the design choices for multi-homing support in Hadoop.
Names are important. In multi-homed cluster configuration parameters, servers shouldalways be identified or “addressed” by hostname, not IP address. [1] This enables the next several bullet points.
FQDNs are strongly preferred. It is recommended that all hostname strings be represented as Fully Qualified Domain Names, unless short hostnames are used throughout with rigorous consistency, for both clients and servers.
Each network should have DNS service available, with forward and reverse name lookup implemented “round-trip-able” for all Hadoop servers and client hosts expected to use that network. It is common and allowable for the same hostname to resolve to various of the host’s multiple IP addresses, depending on which DNS service (on which network) is used.
DNS services are a level of indirection for deriving both IP address and hostname info from the parameters specified in shared configuration files or from advertised values, to resolve to different networks for different querying processes. This allows, for example, external clients to find and talk to Datanodes on an external (client/server) network, while Datanodes find and talk to each other on an internal (intra-cluster) network.
Each server should have one consistent hostname on all interfaces. Theoretically, DNS allows a server to have a different hostname per network interface. But it is required [2] for multi-homed Hadoop clusters that each server have only one hostname, consistent among all network interfaces used by Hadoop. (Network interfaces excluded from use by Hadoop may allow other hostnames.)
If /etc/hosts files are used in lieu of DNS, then the above constraint to have the same hostname on all interfaces means that each /etc/hosts file can only specify one interface per server. Thus, clients at least are likely to have different /etc/hosts files than servers, and the allowable choices for use of the different networks may be constrained. If these constraints are not acceptable, provide DNS services.
Prior to 2014 there were a variety of parameters proposed for specific issues related to multi-homing. These were incremental band-aids with inconsistent naming and incomplete semantics. In mid-2014 (for hadoop-2.6.0) the Hadoop Core community resolved to address Multi-Homing completely, with consistent semantics and usages. Other components also evolved multi-homing support, with a variety of parameter name and semantics choices.
In general, for every service one must consider three sets of properties, related to the three key questions at the beginning of this section:
1. “Hostname Properties” relate to either specifying or deriving the server hostname by which clients will find each service, and the appropriate network for accessing it.
In some services (eg, HDFS Namenode and YARN ResourceManager) there may just be a parameter in a shared config file that hard-wires a specified server name. This approach is only practical for singleton services like masters, otherwise the config files would have to be edited differently on every node, instead of shared.
Many services (eg, Datanode, HBase services) use generic parameters or other means to perform a self-name-lookup via reverse-DNS, followed by registration (with Master or ZK) where the hostname gets advertised to clients.
Sometimes the hostname is neither specified nor advertised. Clients just need to know somehow, external to the config system, "based on human interaction".
Each client does a DNS lookup on the server hostname to get the IP address (and the associated network) on which to communicate with that service. So this client aspect of the network architecture decisions are expressed in the DNS server configuration rather than the Hadoop configuration info.
2. “Bind Address Properties” provides optional additional addressing information which is used only by the service itself, and only in multi-homed servers. By default, services will bind only to the default network interface
[3] on each server. The Bind Address Properties can cause the service to bind to a non-default network interface if desired, or most commonly the bind address can be set to “0.0.0.0”, causing the service to bind toall available network interfaces on its server.
The bind address usually takes the form of an IP address rather than a network interface name, to be type-compatible with the conventional “0.0.0.0” notation. However, some services allow specifying a network interface name.
The default interface choice, if bind address is unspecified, varies for different services. Choices include:
the default network interface
all interfaces
some other specific interface defined by other parameters.
3. “Kerberos Host Properties” provides the information needed to resolve the“_HOST” substitution in each service’s own host-dependent Kerberos principal name. This is separate from the “Hostname Property” because not all services advertise their hostname to clients, and/or may need to use a different means to determine hostname for security purposes.
Usually the Kerberos Host substitution name is derived from another self-name-lookup, similar to hostname from #1, but possibly on a different DNS service.
Sometimes the hostname info from #1 is used for Kerberos also.
Absent other information, the Kerberos Host will be assigned the default hostname from the default network interface, obtained from the Java call, InetAddress.getLocalHost().getCanonicalHostName()
Different services evolved different parameters, or combinations of parameters, and parameter naming conventions, but all services need to somehow present the above three property sets.
The “listener” port number for each service must also be managed, but this property does not change in single- vs multi-home clusters. So we will not discuss it further here, except to mention that in HDFS and YARN it is conjoined with the “address” parameters (Hostname Property), while in other services such as HBase, it is in separate “port” parameters. Parameters HDFS Service Parameters
For historical reasons, in HDFS and YARN most of the parameters related to Hostname Properties end in “-address” or “.address”. Indeed, in non-multi-homed clusters it is acceptable and was once common to simply use IP addresses rather than hostname strings. In multi-homed clusters, however, all server addressing must be hostname-based. In HDFS and YARN most of the parameters related to Bind Address Properties end in “-bind-host” or “.bind-host”.
The four Namenode services each have a
dfs.namenode.${service}-address parameter, where ${service} is rpc, servicerpc, http, or https. The Namenode server’s hostname should be assigned here, to establish the Hostname Property for each service. This information is provided to clients via shared config files.
For each, there is an optional corresponding
dfs.namenode.${service}-bind-host parameter to establish the Bind Address Properties. It may be set to the server’s IP address on a particular interface, or to “0.0.0.0” to cause the Namenode services to listen on all available network interfaces. If unspecified, the default interface will be used. These parameters are documented at: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html
dfs.namenode.rpc-bind-host -- controls binding for dfs.namenode.rpc-address
dfs.namenode.servicerpc-bind-host -- controls binding for dfs.namenode.servicerpc-address
dfs.namenode.http-bind-host -- controls binding for dfs.namenode.http-address
dfs.namenode.https-bind-host -- controls binding for dfs.namenode.https-address
The Datanodes also present various services; however, their Hostname Properties are established via self-name-lookup as follows:
If present, the hadoop.security.dns.interface and hadoop.security.dns.nameserver parameters are used in the same way as documented in the Security Parameters section below, to determine the Datanode’s hostname. (If the rDNS lookup fails, it will fall back to /etc/hosts.) These two parameters are relatively new, available in Apache Hadoop 2.7.2 and later, and back-ported to HDP-2.2.9 and later, and HDP-2.3.4 and later.
Otherwise if dfs.datanode.dns.interface and dfs.datanode.dns.nameserver parameters are present, they are used in the same way to determine the Datanode’s hostname. (If the rDNS lookup fails there, it willnot fallback to /etc/hosts.) This pair of parameters are (or will shortly be) deprecated, but many field installations currently use them.
If neither set of parameters is present, then the Datanode calls Java InetAddress.getLocalHost().getCanonicalHostName() to determine its hostname.
The Datanode registers its hostname with the Namenode, and the Namenode also captures the Datanode’s IP address from which it registers. (This will be an IP address on the network interface the Datanode determined it should use to communicate with the Namenode.) The Namenode then provides both hostname and IP address to clients, for particular Datanodes, whenever a client operation is needed. Which one the client uses is determined by the
dfs.client.use.datanode.hostname and dfs.datanode.use.datanode.hostname parameter values; see below.
Datanodes also have a set of
dfs.datanode.${service}-address parameters. However, confusingly, these relate only to the Bind Address Property, not the Hostname Property, for Datanodes. They default to “0.0.0.0”, and it is okay to leave them that way unless it is desired to bind to only a single network interface.
Both Namenodes and Datanodes resolve their Kerberos _HOST bindings by using the
hadoop.security.dns.interface and hadoop.security.dns.nameserver parameters, rather than their various *-address parameters. See below under “Security Parameters” for documentation. Clients probably need to use Hostnames when connecting to Datanodes
By defaultHDFS clients connect to Datanodes using an IP address provided by the Namenode. In a multi-homed network, depending on the network configuration, the Datanode IP address known to the Namenode may be unreachable by the clients. The fix is letting clients perform their own DNS resolution of the Datanode hostname. The following setting enables this behavior. The reason this is optional is to allow skipping DNS resolution for clients on networks where the feature is not required (and DNS might not be a reliable/low-latency service).
dfs.client.use.datanode.hostname should be set to true when this this behavior is desired. Datanodes may need to use Hostnames when connecting to other Datanodes:
More rarely, the Namenode-resolved IP address for a Datanode may be unreachable from other Datanodes. The fix is to force Datanodes to perform their own DNS resolution for inter-Datanode connections. The following setting enables this behavior.
dfs.datanode.use.datanode.hostname should be set to true when this this behavior is desired. Client interface specification
Usually only one network is used for external clients to connect to HDFS servers. However, if the client can “see” the HDFS servers on more than one interface, you can control which interface or interfaces are used for HDFS communication by setting this parameter:
<property>
<name>dfs.client.local.interfaces</name>
<description> A comma separated list of network interface names to use for data transfer
between the client and datanodes. When creating a connection to read from or
write to a datanode, the client chooses one of the specified interfaces at random
and binds its socket to the IP of that interface. Individual names may be
specified as either an interface name (eg "eth0"), a subinterface name (eg
"eth0:0"), or an IP address (which may be specified using CIDR notation to match
a range of IPs).
</description>
</property>
Yarn & MapReduce Service Parameters
The yarn and job history “*.address” parameters specify hostname to be used for each service and sub-service. This information is provided to clients via shared config files.
The optional yarn and job history “*.bind-host” parameters control bind address for sets of related sub-services as follows. Setting them to 0.0.0.0 causes the corresponding services to listen on all available network interfaces.
yarn.resourcemanager.bind-host -- controls binding for:
yarn.resourcemanager.address
yarn.resourcemanager.webapp.address
yarn.resourcemanager.webapp.https.address
yarn.resourcemanager.resource-tracker.address
yarn.resourcemanager.scheduler.address
yarn.resourcemanager.admin.address
yarn.nodemanager.bind-host-- controls binding for:
yarn.nodemanager.address
yarn.nodemanager.webapp.address
yarn.nodemanager.webapp.https.address
yarn.nodemanager.localizer.address
yarn.timeline-service.bind-host -- controls binding for:
yarn.timeline-service.address
yarn.timeline-service.webapp.address
yarn.timeline-service.webapp.https.address
mapreduce.jobhistory.bind-host -- controls binding for:
mapreduce.jobhistory.address
mapreduce.jobhistory.admin.address
mapreduce.jobhistory.webapp.address
mapreduce.jobhistory.webapp.https.address
Yarn and job history services resolve their Kerberos _HOST bindings by using the
hadoop.security.dns.interface and hadoop.security.dns.nameserver parameters, rather than their various *-address parameters. See below under “Security Parameters” for documentation. Security Parameters Parameters for _HOST resolution
Each secure host needs to be able to perform _HOST resolution within its own parameter values, to correctly manage Kerberos principals. For this it must determine its own name. By default it will use the canonical name, which is associated with its first network interface. If it is desired to use the hostname associated with a different interface, use these two parameters in core-site.xml.
Core Hadoop services perform a self-name-lookup using these parameters. Many other Hadoop-related services define their own means of determining the Kerberos host substitution name.
These two parameters are relatively new, available in Apache Hadoop 2.7.2 and later, and back-ported to HDP-2.2.9 and later, and HDP-2.3.4 and later. In previous versions of HDFS, the
dfs.datanode.dns.{interface,nameserver} parameter pair were used.
<property>
<name>hadoop.security.dns.interface</name>
<description>
The name of the Network Interface from which the service should determine
its host name for Kerberos login. e.g. “eth2”. In a multi-homed environment,
the setting can be used to affect the _HOST substitution in the service
Kerberos principal. If this configuration value is not set, the service
will use its default hostname as returned by
InetAddress.getLocalHost().getCanonicalHostName().
Most clusters will not require this setting.
</description>
</property>
<property>
<name>hadoop.security.dns.nameserver</name>
<description>
The hostname or IP address of the DNS name server which a service Node
should use to determine its own hostname for Kerberos Login, via a
reverse-DNS query on its own IP address associated with the interface
specified in hadoop.security.dns.interface.
Requires hadoop.security.dns.interface.
Most clusters will not require this setting.
</description>
</property>
Parameters for Security Token service host resolution
In secure multi-homed environments, the following parameter will need to be set to false (it is true by default) on both cluster servers and clients (see
HADOOP-7733), in core-site.xml. If it is not set correctly, the symptom will be inability to submit an application to YARN from an external client (with error “client host not a member of the Hadoop cluster”), or even from an in-cluster client if server failover occurs.
<property>
<name>hadoop.security.token.service.use_ip</name>
<description>
Controls whether tokens always use IP addresses. DNS changes will not
be detected if this option is enabled. Existing client connections that
break will always reconnect to the IP of the original host. New clients
will connect to the host's new IP but fail to locate a token.
Disabling this option will allow existing and new clients to detect
an IP change and continue to locate the new host's token.
Defaults to true, for efficiency in single-homed systems.
If setting to false, be sure to change in both clients and servers.
Most clusters will not require this setting.
</description>
</property>
Secure Hostname Lookup when /etc/hosts is used instead of DNS service
To prevent a security loophole described in
http://ietf.org/rfc/rfc1535.txt, the Hadoop Kerberos libraries in Secure environments with hadoop.security.token.service.use_ip set to false will sometimes do a DNS lookup on a special form of the hostname FQDN that has a “.” appended. For instance, instead of using the FQDN “host1.lab.mycompany.com” it will use the form “host1.lab.mycompany.com.”. [sic] This is called the “absolute rooted” form of the hostname, sometimes written FQDN+".". For an explanation of the need for this form, see the above referenced RFC 1535.
If DNS services are set up as required for multi-homed Hadoop clusters, no special additional setup is needed for this concern, because all modern DNS servers already understand the absolute rooted form of hostnames. However, /etc/hosts processing of name lookup does not. If DNS is not available and /etc/hosts is being used in lieu, then the following additional setup is required:
In the /etc/hosts file of all cluster servers and client hosts, for the entries related to all hostnames of cluster servers, add an additional alias consisting of the FQDN+".". For example, a line in /etc/hosts that would normally look like
39.0.64.1 selena1.labs.mycompany.com selena1 hadoopvm8-1
should be extended to look like
39.0.64.1 selena1.labs.mycompany.com selena1.labs.mycompany.com. selena1 hadoopvm8-1
If the absolute rooted hostname is not included in /etc/hosts when required, the symptom will be “Unknown Host” error (see
https://wiki.apache.org/hadoop/UnknownHost#UsingHostsFiles). HBase
The documentation of multi-homing related parameters in the HBase docs is largely lacking, and in some cases incorrect. The following is based on the most up-to-date information available as of January 2016.
Credit: Josh Elser worked out the correct information for this document, by studying the source code. Thanks, Josh! Service Parameters
Both HMaster and RegionServers advertise their hostname with the cluster’s Zookeeper (ZK) nodes. These ZK nodes are found using the
hbase.zookeeper.quorum parameter value. The ZK server names found here are looked up via DNS on the default interface, by all HBase ZK clients. (Note the DNS server on the default interface may return an address for the ZK node that is on some other network interface.)
The HMaster determines its IP address on the interface specified in the
hbase.master.dns.interface parameter, then does a reverse-DNS query on the DNS server at hbase.master.dns.interface. However, it does not use the resulting hostname directly. Instead, it does a forward-DNS lookup of this name on the default (or bind address) network interface, followed by a reverse-DNS lookup of the resulting IP address on the same interface’s DNS server. The resulting hostname is what the HMaster actually advertises with Zookeeper.
This sequence may be an attempt to correctly handle environments that don’t adhere to the “one consistent hostname on all interfaces” rule.
Regardless, if the “one consistent hostname on all interfaces” rule has been followed, it should be easy to predict the resulting Hostname Property value.
Each RegionServer obtains the HMaster’s hostname from Zookeeper, and looks it up with DNS on the default network. It then registers itself with the HMaster. When it does so, the HMaster captures the IP address of the RegionServer, and does a reverse-DNS lookup on the default (or HMaster bind address) interface to obtain a hostname for the RegionServer. It then delivers this hostname back to the RegionServer, which in turn advertises it on Zookeeper.
Thus, no config parameters are relevant to the RegionServer’s Hostname Property.
Again, it is important that each RegionServer’s hostname be the same on all interfaces if different clients are to use different interfaces, because the HMaster looks it up on only one.
Regardless of the above determination of hostname, the Bind Address Property may change the IP address binding of each service, to a non-default interface, or to all interfaces (0.0.0.0). The parameters for Bind Address Property are as follows, for each service:
HMaster RPC
hbase.master.ipc.address
HMaster WebUI
hbase.master.info.bindAddress
RegionServer RPC
hbase.regionserver.ipc.address
RegionServer WebUI
hbase.regionserver.info.bindAddress
REST RPC
hbase.rest.host
REST WebUI
hbase.rest.info.bindAddress
Thrift1 RPC
hbase.regionserver.thrift.ipaddress
Thrift2 RPC
hbase.thrift.info.bindAddress
Notes:
The WebUI services do not advertise their hostnames separately from the RPC services.
The REST and Thrift services do not advertise their hostnames at all; their clients must simply know “from human interaction” where to find the corresponding servers.
Finally, services and clients use a
dns.{interface,nameserver} parameter pair to determine their Kerberos Host substitution names, as follows. See hadoop.security.dns.interface and hadoop.security.dns.nameserver in the “Security Parameters” section for documentation of the semantics of these parameter pairs.
HMaster RPC
hbase.master.dns.{interface,nameserver}
HMaster WebUI
N/A
RegionServer RPC
hbase.regionserver.dns.{interface,nameserver}
RegionServer WebUI
N/A
REST RPC
N/A
REST WebUI
N/A
Thrift1 RPC
hbase.thrift.dns.{interface,nameserver}
Thrift2 RPC
hbase.thrift.dns.{interface,nameserver}
HBase Client
hbase.client.dns.{interface,nameserver}
Lastly, there is a bit of complexity with ZK node names, if the HBase-embedded ZK quorum is used. When HBase is instantiating an embedded set of ZK nodes, it is necessary to correlate the node hostnames with the integer indices in the “zoo.cfg” configuration. As part of this, ZK has a
hbase.zookeeper.dns.{interface,nameserver} parameter pair. Each ZK node uses these in the usual way to determine its own hostname, which is then fed into the embedded ZK configuration process. Superseded Parameters
Superseded by
hadoop.security.dns.interface and hadoop.security.dns.nameserver, in Apache Hadoop 2.7.2 and later (back-ported to HDP-2.2.9 and later, and HDP-2.3.4 and later):
dfs.datanode.dns.interface
dfs.datanode.dns.nameserver
Superceded by Namenode HA:
dfs.namenode.secondary.http-address and https-address
only used in non-HA systems that have two NN masters yet have not configured HA [4]
Superceded hdfs/yarn/mapreduce parameters used by older systems:
yarn.nodemanager.webapp.bind-host: "0.0.0.0",
yarn.nodemanager.webapp.https.bind-host: "0.0.0.0",
just use yarn.nodemanager.bind-host: "0.0.0.0"
yarn.resourcemanager.admin.bind-host: "0.0.0.0",
yarn.resourcemanager.resource-tracker.bind-host: "0.0.0.0",
yarn.resourcemanager.scheduler.bind-host: "0.0.0.0",
just use yarn.resourcemanager.bind-host: "0.0.0.0"
yarn.timeline-service.webapp.bind-host: "0.0.0.0",
yarn.timeline-service.webapp.https.bind-host: "0.0.0.0",
just use yarn.timeline-service.bind-host: "0.0.0.0"
mapreduce.jobhistory.admin.bind-host: "0.0.0.0",
mapreduce.jobhistory.webapp.bind-host: "0.0.0.0",
mapreduce.jobhistory.webapp.https.bind-host: "0.0.0.0",
just use mapreduce.jobhistory.bind-host: "0.0.0.0"
Older historical items superceded:
dfs.namenode.master.name
predecessor of namenode bind-host parameters, and/or rule of “one consistent hostname on all interfaces”.
dfs.datanode.hostname
predecessor of “one consistent hostname on all interfaces” rule.
Note
Regarding the blog at:
http://hortonworks.com/blog/multihoming-on-hadoop-yarn-clusters/
In “Taking a Test Drive” section of this blog it suggests using EC2 with Public and Private networks. We now know this doesn’t work, since by default in AWS, Public addresses are just NAT’ed to Private. Instead, one must use VPC capability with 2 Private networks. Should fix this document.
[1] Unless you really, really know what you’re doing, and want to work through every little nuance of the semantics of all the parameters. About the only situation in which you might get away with using IP addresses for server addressing is if ALL services are being bound to specific single interfaces, not using 0.0.0.0 at all.
[2] Having one consistent hostname on all interfaces is not a strict technical requirement of Hadoop. But it is a major simplifying assumption, without which the Hadoop administrator is likely to land in a morass of very complex and mutually conflicting parameter choices. For this reason, Hortonworks only supports using one consistent hostname per server on all interfaces used by Hadoop.
[3] In non-virtual Linux systems, the default network interface is typically “eth0” or “eth0:0”, but this conventional naming can be changed in system network configuration or with network virtualization.
[4] Not sure why the resources would be committed for a secondary NN, but not set up HA. However, this is still a supported configuration.
... View more
03-03-2016
06:14 PM
2 Kudos
Are you using a version of Ambari less than 2.2.1? Please refer to http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Installing_HDP_AMB/content/_determine_stack_compatibility.html , where it says only Ambari-2.2.1 supports HDP-2.4. Therefore, if you are using < 2.2.1, please upgrade Ambari before trying to upgrade the stack. The upgrade doc is at http://docs.hortonworks.com/HDPDocuments/Ambari/Ambari-2.2.1.0/index.html#bk_upgrading_Ambari
... View more
02-13-2016
06:06 AM
5 Kudos
Folks often ask about best practice for setting replication factor, evidently wondering if the default value of 3 is supported by factual data. The cool answer is, yes it is! Rob Chansler, an excellent engineering manager and contributor to Hadoop at Yahoo for several years, posted the best material in 2011. The hard-core math is in a spreadsheet attached to Apache Jira https://issues.apache.org/jira/browse/HDFS-2535, "A Model for Data Durability", where he uses reasonable assumptions and experience from Yahoo to calculate the probable rate of data loss events at a single site due to node failures when replication is set to 3, as 0.021 events per century. See "Attachments" : "LosingBlocks.xlsx" in the Jira ticket. Rob also published an article in Usenix titled Data Availability and Durability with the Hadoop Distributed File System, and did a related presentation at Hadoop Summit 2011, Data Integrity and Availability of HDFS (video). Sanjay Radia summarized this information in Hortonworks blog Data Integrity and Availability in Apache Hadoop HDFS, but that article did not focus on the replication factor specifically.
... View more
Labels:
02-12-2016
07:21 PM
1 Kudo
@Benjamin Leonhardi, please see this post: https://community.hortonworks.com/articles/16719/hdfs-data-durability-and-availability-with-replica.html. Please up-vote it if you like it 🙂
... View more
02-11-2016
06:54 PM
2 Kudos
(part 2 of 2) Increasing replication to 4 or more only makes
sense in DR situations, where you want, say, 2 copies at each of two sites, but
then you should be using Falcon or similar to maintain the DR copies, not HDFS
replication. On the other hand, replication = 2 definitely leaves your data
vulnerable to "it can't happen to me" scenarios -- it only takes the
wrong two disk drives to fail at the same time, and at some point they will.
So decreasing replication below 3 only makes sense if it is low value
data or you're using RAID or other highly reliable storage to protect your
data.
... View more
02-11-2016
06:53 PM
3 Kudos
(part 1 of 2) Just a quick comment on replication: It is completely separate from issues of block size and file size. It has been established (there's a white paper somewhere) that replication = 3 makes it astronomically unlikely that you'll ever actually lose data within a single site due to hardware failure, as long as HDFS has plenty of datanodes and drives provisioned, dead drives on datanodes get replaced within a day or two of dying, and your site integrity doesn't suffer a catastrophic site loss.
... View more
02-10-2016
08:50 PM
6 Kudos
We recently had a customer who was doing stress testing on a
very small cluster with just four Datanodes.
Their tests included simulated crashes and sudden power-offs. They were getting recurring client errors of hdfs.DFSClient:
Exception in createBlockOutputStream with various log messages including: Client: EOFException: Premature EOF: no length prefix
available IOException: Failed to replace a bad datanode on
the existing pipeline due to no more good datanodes being available to try Datanode: ReplicaAlreadyExistsException: Block
<number> already exists in state FINALIZED and thus cannot be created ReplicaNotFoundException: Cannot append to a
non-existent replica FsDatasetImpl
(FsDatasetImpl.java:recoverAppend(1053)) - Recover failed append to BP-<number> ReplicaNotFoundException: Cannot append to a
replica with unexpeted generation stamp Namenode: BlockPlacementPolicy - Failed to place enough
replicas, still in need of 1 to reach 3 The reason for these errors has to do with:
HDFS efforts to recover the replication pipeline
if a Datanode fails to complete a write operation, and the three configuration
parameters that control this pipeline recovery behavior. The fact that HDFS is stricter about replication
in case of an Append or Hflush, than during normal write-once file writing. Note that HBase requires frequent use of file
appends. PIPELINE RECOVERY As you know if you’ve read about the design of HDFS, when a
block is opened for writing, a pipeline is created of r Datanodes (where r is
the replication factor) to receive
the replicas of the block. The Client
sends the block data to the first Datanode, which sends it to the second, and
so on. A block is only considered
complete when all Datanodes in the pipeline have reported finishing writes of
their replicas. Replication through the pipeline of Datanodes happens in the
background, and in parallel with Client writes, so completion of the block
normally occurs almost immediately after the Client finishes writing the first
replica. However, if any of the Datanodes in the pipeline fail to
successfully finish their writes, the Client attempts to recover the replication
pipeline by finding a new available Datanode and replacing the failing
Datanode. In Hadoop-2.6 and later
(HDP-2.2 and later) this behavior is controlled by the following three
configuration parameters in hdfs-site.xml.
Please read how they work (from hdfs-default.xml): <property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>true</value>
<description>
If there is a datanode/network failure in the write pipeline,
DFSClient will try to remove the failed datanode from the pipeline
and then continue writing with the remaining datanodes. As a result,
the number of datanodes in the pipeline is decreased. The feature is
to add new datanodes to the pipeline.
This is a site-wide property to enable/disable the feature.
When the cluster size is extremely small, e.g. 3 nodes or less, cluster
administrators may want to set the policy to NEVER in the default
configuration file or disable this feature. Otherwise, users may
experience an unusually high rate of pipeline failures since it is
impossible to find new datanodes for replacement.
See also dfs.client.block.write.replace-datanode-on-failure.policy
</description>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>DEFAULT</value>
<description>
This property is used only if the value of
dfs.client.block.write.replace-datanode-on-failure.enable is true.
ALWAYS: always add a new datanode when an existing datanode is removed.
NEVER: never add a new datanode.
DEFAULT:
Let r be the replication number.
Let n be the number of existing datanodes.
Add a new datanode only if r is greater than or equal to 3 and either
(1) floor(r/2) is greater than or equal to n; or
(2) r is greater than n and the block is hflushed/appended.
</description>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.best-effort</name>
<value>false</value>
<description>
This property is used only if the value of
dfs.client.block.write.replace-datanode-on-failure.enable is true.
Best effort means that the client will try to replace a failed datanode
in write pipeline (provided that the policy is satisfied), however, it
continues the write operation in case that the datanode replacement also
fails.
Suppose the datanode replacement fails.
false: An exception should be thrown so that the write will fail.
true : The write should be resumed with the remaining datandoes.
Note that setting this property to true allows writing to a pipeline
with a smaller number of datanodes. As a result, it increases the
probability of data loss.
</description>
</property>
(The third “best-effort” parameter was not available in
older versions of HDFS prior to Hadoop-2.6/HDP-2.2.) We see then that with the “DEFAULT” policy, if the number of
available Datanodes falls below 3 (the default replication factor), ordinary writing
can continue, but Appends or writes with Hflush will fail, with exceptions like
those mentioned above. The reason for
Appends being held to a stricter standard of replication is that appends
require manipulating generation numbers of the replicas, and maintaining
consistency of generation number across replicas requires a great deal of care. Unfortunately, in cases where there are still 3 or 4 working
Datanodes, failures can still occur on writes or appends, although it
should be infrequent. Basically, a very busy cluster (due to
concurrency, specifically both the number of blocks being operated on
concurrently and the quantity of large packets being moved over IP channels)
may experience write pipeline failures due to socket timeout or other
naturally occurring exceptions. When the cluster is large enough, it is
easy to find a datanode for replacement. But when a cluster has only 3 or
4 datanodes, it may be hard or impossible to pick a different node, and so the
exception is raised. In a small cluster, that is being used for HBase, or
otherwise experiencing a lot of Appends and explicit Hflush calls, it may make
sense to set 'dfs.client.block.write.replace-datanode-on-failure.best-effort'
to true. This enables Appends to
continue without experiencing the exceptions noted above, yet allows the system
to repair on the fly as best it can. Notice that this is trading the cast-iron
reliability of guaranteed 3-way replication for a lower level of surety, to
avoid these errors. Hortonworks also suggests that small clusters of less than
10 Datanodes are inappropriate for stress testing, because HDFS reliability
depends in part on having plenty of Datanodes to swap between when fatal or
intermittent write problems occur. REPLICA RECOVERY It should also be noted that crash testing can of course
cause replicas to become corrupted when a write is interrupted. Also, if writing is allowed to continue
without completely recovering the replication pipeline, then the number of
replicas may be less than required. Both
of these problems are self-healing over time, but they do take some time, as
follows. Blocks with incorrect number of replicas will be detected
every few seconds (under control of the ‘dfs.namenode.replication.interval’ parameter),
by a scanning process inside the Namenode, and fixed shortly thereafter. This is different than corrupt replicas. Corrupt replicas will have incorrect checksums, and this
protects the Client from using any corrupt replicas they may encounter. When corrupt replicas are detected by
attempted client access, they are immediately put in a fix queue. There is also a background process that very
slowly sweeps the entire storage system for corrupt replicas; this requires
reading the replicas themselves, and typically takes days or longer per cycle
on a production cluster. Once in the fix
queue, replicas are typically fixed within minutes, unless the system is
extraordinarily busy.
... View more
01-26-2016
07:50 PM
1 Kudo
Yes, this graph seems reasonable. Very early in the curve, the steep growth represents vocabulary acquisition. After most words in the dominant language have been indexed, the curve flattens out to represent just the data necessary to represent each document's invert index + tags, which is roughly proportional to the size of the document.
... View more