About mfoley

mfoley · ‎04-05-2016

@Sunile Manjee Can you please clarify what you mean by "full restore" in this case? A truly full restore of HDFS would restore to a cross-cluster consistent Point-in-Time (PiT) snapshot for all datanodes AND namenode. Therefore the namenode's idea of existing blocks would match the blocks existing on the datanodes after the restore. If the PiT snapshot was not perfect (for instance, if some HDFS files were in the midst of being written during the process of taking the snapshot), then you might need to do some cleanup of the blocks for those affected files. But not via a reformat. In general, a reformat of HDFS erases everything in it, just as with reformat of any file system. So it seems unlikely this is what you want. If you really do want a reformat, then there is no need to do a restore first; it is all going away anyway. How was your snapshot taken, and how are you doing the full restore?

mfoley · ‎03-30-2016

Hi @Sotiris Madamas , if Mats's answer met your needs, please mark it accepted.

mfoley · ‎03-23-2016

Introduction Motivation In many production clusters, the servers (or VMs) have multiple physical Network Interface Cards (NICs) and/or multiple Virtual Network Interfaces (VIFs) and are configured to participate in multiple IP address spaces (“networks”). This may be done for a variety of reasons, including but not limited to the following: to improve availability or bandwidth through redundancy to partition traffic for security, bandwidth, or other management reasons etc. In many of these “multi-homed” clusters it is desirable to configure the various Hadoop-related services to “listen” on all the network interfaces rather than on just one. Clients, on the other hand, must be given a specific target to which they should initiate connections. Note that NIC Bonding (also known as NIC Teaming or Link Aggregation), where two or more physical NICs are made to look like a single network interface, is not included in this topic. The settings discussed below are usually not applicable to a NIC bonding configuration which handles multiplexing and failover transparently while presenting only one “logical network” to applications. However, there is a hybrid case where multiple physical NICs are bonded for redundancy and bandwidth, then multiple VIFs are layered on top of them for partitioning of data flows. The multi-homing parameters then relate to the VIFs. This Document The purpose of this document is to gather in one place all the many parameters for all the different Hadoop-related components, related to configuring multi-homing support for services and clients. In general, to support multi-homing on the services you care about, you must set these parameters in those components and any components upon which they depend. Almost all components depend on Hadoop Core, HDFS and Yarn, so these are given first, along with Security related parameters. This document assumes you have HDP version 2.3 or later. Most but not all of the features are available in 2.1 and 2.2 also. Vocabulary Server and service are carefully distinguished in this document. Servers are host machines or VMs. Services are Hadoop component processes running on servers. Client: a process that initiates communication with a service. Client processes may be on a host outside of the Hadoop cluster, or on a server that is part of the Hadoop cluster. Sometimes Hadoop services are clients of other Hadoop services. DNS and rDNS lookup: “Distributed Name Service” (alt. “Domain Name Service”) - The service that converts hostnames to IP addresses for clients wishing to communicate with said hostname; this is often referred to as “DNS lookup”. If properly configured, the service can also provide “reverse-DNS” (rDNS) lookup, which converts IP addresses back into hostnames. It is common for different networks to have different DNS Servers, and the same DNS or rDNS query to different name servers may get different responses. This is used for Hadoop Multi-Homing to provide a level of indirection associating different IP addresses assigned to a multi-homed server for clients on different networks. Round-trip-able DNS: Within a single network’s DNS service, if one starts with a valid IP address and performs an rDNS lookup to get a hostname, then DNS lookup on that hostname should return the original IP address; or alternatively, if one starts with a valid hostname and performs a DNS lookup on it to get an IP address, then rDNS lookup on that IP address should return the original hostname. FQDN: “Fully Qualified Domain Name” - the long form of hostnames, of the form “shortname.domainname”, eg, “host234.lab1.hortonworks.com”. This contrasts with the “short form” hostname, which in this example would be just “host234”. Use Cases To understand the semantics of configuration parameters for multi-homing, it is useful to consider a few practical use cases for why it is desirable to have multi-homed servers and how Hadoop can utilize them. Here are four use cases that cover most of the concepts, although this is not an exhaustive list: intra-cluster network separate from external client-server network dedicated high-bandwidth network separate from general-use network dedicated high-security or Hadoop administrator network separate from medium-security general-use network sysop/non-Hadoop network separate from Hadoop network(s) The independent variables in each use case are: which network interfaces are available on servers and client hosts, within the cluster and external to the cluster; and which interfaces are preferred for service/service and client/service communications for each service. <Use cases to be expanded in a later version of this document.> Theory of Operations In each use case, the site architects will determine which network interfaces are available on servers and client hosts, and which interfaces are preferred for service/service and client/service communications of various types. We must then provide configuration information that specifies for the cluster: How clients will find the server hostname of each service Whether each service will bind to one or all network interfaces on its server How services will self-identify their server to Kerberos Over time, the Hadoop community evolved an understanding of the configuration practices that make these multi-homing use cases manageable: Configuration files should be shared among clients and services, as much as possible. This is highly desirable to minimize management overhead, and the desire to share common config files has guided a lot of the design choices for multi-homing support in Hadoop. Names are important. In multi-homed cluster configuration parameters, servers shouldalways be identified or “addressed” by hostname, not IP address. [1] This enables the next several bullet points. FQDNs are strongly preferred. It is recommended that all hostname strings be represented as Fully Qualified Domain Names, unless short hostnames are used throughout with rigorous consistency, for both clients and servers. Each network should have DNS service available, with forward and reverse name lookup implemented “round-trip-able” for all Hadoop servers and client hosts expected to use that network. It is common and allowable for the same hostname to resolve to various of the host’s multiple IP addresses, depending on which DNS service (on which network) is used. DNS services are a level of indirection for deriving both IP address and hostname info from the parameters specified in shared configuration files or from advertised values, to resolve to different networks for different querying processes. This allows, for example, external clients to find and talk to Datanodes on an external (client/server) network, while Datanodes find and talk to each other on an internal (intra-cluster) network. Each server should have one consistent hostname on all interfaces. Theoretically, DNS allows a server to have a different hostname per network interface. But it is required [2] for multi-homed Hadoop clusters that each server have only one hostname, consistent among all network interfaces used by Hadoop. (Network interfaces excluded from use by Hadoop may allow other hostnames.) If /etc/hosts files are used in lieu of DNS, then the above constraint to have the same hostname on all interfaces means that each /etc/hosts file can only specify one interface per server. Thus, clients at least are likely to have different /etc/hosts files than servers, and the allowable choices for use of the different networks may be constrained. If these constraints are not acceptable, provide DNS services. Prior to 2014 there were a variety of parameters proposed for specific issues related to multi-homing. These were incremental band-aids with inconsistent naming and incomplete semantics. In mid-2014 (for hadoop-2.6.0) the Hadoop Core community resolved to address Multi-Homing completely, with consistent semantics and usages. Other components also evolved multi-homing support, with a variety of parameter name and semantics choices. In general, for every service one must consider three sets of properties, related to the three key questions at the beginning of this section: 1. “Hostname Properties” relate to either specifying or deriving the server hostname by which clients will find each service, and the appropriate network for accessing it. In some services (eg, HDFS Namenode and YARN ResourceManager) there may just be a parameter in a shared config file that hard-wires a specified server name. This approach is only practical for singleton services like masters, otherwise the config files would have to be edited differently on every node, instead of shared. Many services (eg, Datanode, HBase services) use generic parameters or other means to perform a self-name-lookup via reverse-DNS, followed by registration (with Master or ZK) where the hostname gets advertised to clients. Sometimes the hostname is neither specified nor advertised. Clients just need to know somehow, external to the config system, "based on human interaction". Each client does a DNS lookup on the server hostname to get the IP address (and the associated network) on which to communicate with that service. So this client aspect of the network architecture decisions are expressed in the DNS server configuration rather than the Hadoop configuration info. 2. “Bind Address Properties” provides optional additional addressing information which is used only by the service itself, and only in multi-homed servers. By default, services will bind only to the default network interface [3] on each server. The Bind Address Properties can cause the service to bind to a non-default network interface if desired, or most commonly the bind address can be set to “0.0.0.0”, causing the service to bind toall available network interfaces on its server. The bind address usually takes the form of an IP address rather than a network interface name, to be type-compatible with the conventional “0.0.0.0” notation. However, some services allow specifying a network interface name. The default interface choice, if bind address is unspecified, varies for different services. Choices include: the default network interface all interfaces some other specific interface defined by other parameters. 3. “Kerberos Host Properties” provides the information needed to resolve the“_HOST” substitution in each service’s own host-dependent Kerberos principal name. This is separate from the “Hostname Property” because not all services advertise their hostname to clients, and/or may need to use a different means to determine hostname for security purposes. Usually the Kerberos Host substitution name is derived from another self-name-lookup, similar to hostname from #1, but possibly on a different DNS service. Sometimes the hostname info from #1 is used for Kerberos also. Absent other information, the Kerberos Host will be assigned the default hostname from the default network interface, obtained from the Java call, InetAddress.getLocalHost().getCanonicalHostName() Different services evolved different parameters, or combinations of parameters, and parameter naming conventions, but all services need to somehow present the above three property sets. The “listener” port number for each service must also be managed, but this property does not change in single- vs multi-home clusters. So we will not discuss it further here, except to mention that in HDFS and YARN it is conjoined with the “address” parameters (Hostname Property), while in other services such as HBase, it is in separate “port” parameters. Parameters HDFS Service Parameters For historical reasons, in HDFS and YARN most of the parameters related to Hostname Properties end in “-address” or “.address”. Indeed, in non-multi-homed clusters it is acceptable and was once common to simply use IP addresses rather than hostname strings. In multi-homed clusters, however, all server addressing must be hostname-based. In HDFS and YARN most of the parameters related to Bind Address Properties end in “-bind-host” or “.bind-host”. The four Namenode services each have a dfs.namenode.${service}-address parameter, where ${service} is rpc, servicerpc, http, or https. The Namenode server’s hostname should be assigned here, to establish the Hostname Property for each service. This information is provided to clients via shared config files. For each, there is an optional corresponding dfs.namenode.${service}-bind-host parameter to establish the Bind Address Properties. It may be set to the server’s IP address on a particular interface, or to “0.0.0.0” to cause the Namenode services to listen on all available network interfaces. If unspecified, the default interface will be used. These parameters are documented at: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html dfs.namenode.rpc-bind-host -- controls binding for dfs.namenode.rpc-address dfs.namenode.servicerpc-bind-host -- controls binding for dfs.namenode.servicerpc-address dfs.namenode.http-bind-host -- controls binding for dfs.namenode.http-address dfs.namenode.https-bind-host -- controls binding for dfs.namenode.https-address The Datanodes also present various services; however, their Hostname Properties are established via self-name-lookup as follows: If present, the hadoop.security.dns.interface and hadoop.security.dns.nameserver parameters are used in the same way as documented in the Security Parameters section below, to determine the Datanode’s hostname. (If the rDNS lookup fails, it will fall back to /etc/hosts.) These two parameters are relatively new, available in Apache Hadoop 2.7.2 and later, and back-ported to HDP-2.2.9 and later, and HDP-2.3.4 and later. Otherwise if dfs.datanode.dns.interface and dfs.datanode.dns.nameserver parameters are present, they are used in the same way to determine the Datanode’s hostname. (If the rDNS lookup fails there, it willnot fallback to /etc/hosts.) This pair of parameters are (or will shortly be) deprecated, but many field installations currently use them. If neither set of parameters is present, then the Datanode calls Java InetAddress.getLocalHost().getCanonicalHostName() to determine its hostname. The Datanode registers its hostname with the Namenode, and the Namenode also captures the Datanode’s IP address from which it registers. (This will be an IP address on the network interface the Datanode determined it should use to communicate with the Namenode.) The Namenode then provides both hostname and IP address to clients, for particular Datanodes, whenever a client operation is needed. Which one the client uses is determined by the dfs.client.use.datanode.hostname and dfs.datanode.use.datanode.hostname parameter values; see below. Datanodes also have a set of dfs.datanode.${service}-address parameters. However, confusingly, these relate only to the Bind Address Property, not the Hostname Property, for Datanodes. They default to “0.0.0.0”, and it is okay to leave them that way unless it is desired to bind to only a single network interface. Both Namenodes and Datanodes resolve their Kerberos _HOST bindings by using the hadoop.security.dns.interface and hadoop.security.dns.nameserver parameters, rather than their various *-address parameters. See below under “Security Parameters” for documentation. Clients probably need to use Hostnames when connecting to Datanodes By defaultHDFS clients connect to Datanodes using an IP address provided by the Namenode. In a multi-homed network, depending on the network configuration, the Datanode IP address known to the Namenode may be unreachable by the clients. The fix is letting clients perform their own DNS resolution of the Datanode hostname. The following setting enables this behavior. The reason this is optional is to allow skipping DNS resolution for clients on networks where the feature is not required (and DNS might not be a reliable/low-latency service). dfs.client.use.datanode.hostname should be set to true when this this behavior is desired. Datanodes may need to use Hostnames when connecting to other Datanodes: More rarely, the Namenode-resolved IP address for a Datanode may be unreachable from other Datanodes. The fix is to force Datanodes to perform their own DNS resolution for inter-Datanode connections. The following setting enables this behavior. dfs.datanode.use.datanode.hostname should be set to true when this this behavior is desired. Client interface specification Usually only one network is used for external clients to connect to HDFS servers. However, if the client can “see” the HDFS servers on more than one interface, you can control which interface or interfaces are used for HDFS communication by setting this parameter: <property> <name>dfs.client.local.interfaces</name> <description> A comma separated list of network interface names to use for data transfer between the client and datanodes. When creating a connection to read from or write to a datanode, the client chooses one of the specified interfaces at random and binds its socket to the IP of that interface. Individual names may be specified as either an interface name (eg "eth0"), a subinterface name (eg "eth0:0"), or an IP address (which may be specified using CIDR notation to match a range of IPs). </description> </property> Yarn & MapReduce Service Parameters The yarn and job history “*.address” parameters specify hostname to be used for each service and sub-service. This information is provided to clients via shared config files. The optional yarn and job history “*.bind-host” parameters control bind address for sets of related sub-services as follows. Setting them to 0.0.0.0 causes the corresponding services to listen on all available network interfaces. yarn.resourcemanager.bind-host -- controls binding for: yarn.resourcemanager.address yarn.resourcemanager.webapp.address yarn.resourcemanager.webapp.https.address yarn.resourcemanager.resource-tracker.address yarn.resourcemanager.scheduler.address yarn.resourcemanager.admin.address yarn.nodemanager.bind-host-- controls binding for: yarn.nodemanager.address yarn.nodemanager.webapp.address yarn.nodemanager.webapp.https.address yarn.nodemanager.localizer.address yarn.timeline-service.bind-host -- controls binding for: yarn.timeline-service.address yarn.timeline-service.webapp.address yarn.timeline-service.webapp.https.address mapreduce.jobhistory.bind-host -- controls binding for: mapreduce.jobhistory.address mapreduce.jobhistory.admin.address mapreduce.jobhistory.webapp.address mapreduce.jobhistory.webapp.https.address Yarn and job history services resolve their Kerberos _HOST bindings by using the hadoop.security.dns.interface and hadoop.security.dns.nameserver parameters, rather than their various *-address parameters. See below under “Security Parameters” for documentation. Security Parameters Parameters for _HOST resolution Each secure host needs to be able to perform _HOST resolution within its own parameter values, to correctly manage Kerberos principals. For this it must determine its own name. By default it will use the canonical name, which is associated with its first network interface. If it is desired to use the hostname associated with a different interface, use these two parameters in core-site.xml. Core Hadoop services perform a self-name-lookup using these parameters. Many other Hadoop-related services define their own means of determining the Kerberos host substitution name. These two parameters are relatively new, available in Apache Hadoop 2.7.2 and later, and back-ported to HDP-2.2.9 and later, and HDP-2.3.4 and later. In previous versions of HDFS, the dfs.datanode.dns.{interface,nameserver} parameter pair were used. <property> <name>hadoop.security.dns.interface</name> <description> The name of the Network Interface from which the service should determine its host name for Kerberos login. e.g. “eth2”. In a multi-homed environment, the setting can be used to affect the _HOST substitution in the service Kerberos principal. If this configuration value is not set, the service will use its default hostname as returned by InetAddress.getLocalHost().getCanonicalHostName(). Most clusters will not require this setting. </description> </property> <property> <name>hadoop.security.dns.nameserver</name> <description> The hostname or IP address of the DNS name server which a service Node should use to determine its own hostname for Kerberos Login, via a reverse-DNS query on its own IP address associated with the interface specified in hadoop.security.dns.interface. Requires hadoop.security.dns.interface. Most clusters will not require this setting. </description> </property> Parameters for Security Token service host resolution In secure multi-homed environments, the following parameter will need to be set to false (it is true by default) on both cluster servers and clients (see HADOOP-7733), in core-site.xml. If it is not set correctly, the symptom will be inability to submit an application to YARN from an external client (with error “client host not a member of the Hadoop cluster”), or even from an in-cluster client if server failover occurs. <property> <name>hadoop.security.token.service.use_ip</name> <description> Controls whether tokens always use IP addresses. DNS changes will not be detected if this option is enabled. Existing client connections that break will always reconnect to the IP of the original host. New clients will connect to the host's new IP but fail to locate a token. Disabling this option will allow existing and new clients to detect an IP change and continue to locate the new host's token. Defaults to true, for efficiency in single-homed systems. If setting to false, be sure to change in both clients and servers. Most clusters will not require this setting. </description> </property> Secure Hostname Lookup when /etc/hosts is used instead of DNS service To prevent a security loophole described in http://ietf.org/rfc/rfc1535.txt, the Hadoop Kerberos libraries in Secure environments with hadoop.security.token.service.use_ip set to false will sometimes do a DNS lookup on a special form of the hostname FQDN that has a “.” appended. For instance, instead of using the FQDN “host1.lab.mycompany.com” it will use the form “host1.lab.mycompany.com.”. [sic] This is called the “absolute rooted” form of the hostname, sometimes written FQDN+".". For an explanation of the need for this form, see the above referenced RFC 1535. If DNS services are set up as required for multi-homed Hadoop clusters, no special additional setup is needed for this concern, because all modern DNS servers already understand the absolute rooted form of hostnames. However, /etc/hosts processing of name lookup does not. If DNS is not available and /etc/hosts is being used in lieu, then the following additional setup is required: In the /etc/hosts file of all cluster servers and client hosts, for the entries related to all hostnames of cluster servers, add an additional alias consisting of the FQDN+".". For example, a line in /etc/hosts that would normally look like 39.0.64.1 selena1.labs.mycompany.com selena1 hadoopvm8-1 should be extended to look like 39.0.64.1 selena1.labs.mycompany.com selena1.labs.mycompany.com. selena1 hadoopvm8-1 If the absolute rooted hostname is not included in /etc/hosts when required, the symptom will be “Unknown Host” error (see https://wiki.apache.org/hadoop/UnknownHost#UsingHostsFiles). HBase The documentation of multi-homing related parameters in the HBase docs is largely lacking, and in some cases incorrect. The following is based on the most up-to-date information available as of January 2016. Credit: Josh Elser worked out the correct information for this document, by studying the source code. Thanks, Josh! Service Parameters Both HMaster and RegionServers advertise their hostname with the cluster’s Zookeeper (ZK) nodes. These ZK nodes are found using the hbase.zookeeper.quorum parameter value. The ZK server names found here are looked up via DNS on the default interface, by all HBase ZK clients. (Note the DNS server on the default interface may return an address for the ZK node that is on some other network interface.) The HMaster determines its IP address on the interface specified in the hbase.master.dns.interface parameter, then does a reverse-DNS query on the DNS server at hbase.master.dns.interface. However, it does not use the resulting hostname directly. Instead, it does a forward-DNS lookup of this name on the default (or bind address) network interface, followed by a reverse-DNS lookup of the resulting IP address on the same interface’s DNS server. The resulting hostname is what the HMaster actually advertises with Zookeeper. This sequence may be an attempt to correctly handle environments that don’t adhere to the “one consistent hostname on all interfaces” rule. Regardless, if the “one consistent hostname on all interfaces” rule has been followed, it should be easy to predict the resulting Hostname Property value. Each RegionServer obtains the HMaster’s hostname from Zookeeper, and looks it up with DNS on the default network. It then registers itself with the HMaster. When it does so, the HMaster captures the IP address of the RegionServer, and does a reverse-DNS lookup on the default (or HMaster bind address) interface to obtain a hostname for the RegionServer. It then delivers this hostname back to the RegionServer, which in turn advertises it on Zookeeper. Thus, no config parameters are relevant to the RegionServer’s Hostname Property. Again, it is important that each RegionServer’s hostname be the same on all interfaces if different clients are to use different interfaces, because the HMaster looks it up on only one. Regardless of the above determination of hostname, the Bind Address Property may change the IP address binding of each service, to a non-default interface, or to all interfaces (0.0.0.0). The parameters for Bind Address Property are as follows, for each service: HMaster RPC hbase.master.ipc.address HMaster WebUI hbase.master.info.bindAddress RegionServer RPC hbase.regionserver.ipc.address RegionServer WebUI hbase.regionserver.info.bindAddress REST RPC hbase.rest.host REST WebUI hbase.rest.info.bindAddress Thrift1 RPC hbase.regionserver.thrift.ipaddress Thrift2 RPC hbase.thrift.info.bindAddress Notes: The WebUI services do not advertise their hostnames separately from the RPC services. The REST and Thrift services do not advertise their hostnames at all; their clients must simply know “from human interaction” where to find the corresponding servers. Finally, services and clients use a dns.{interface,nameserver} parameter pair to determine their Kerberos Host substitution names, as follows. See hadoop.security.dns.interface and hadoop.security.dns.nameserver in the “Security Parameters” section for documentation of the semantics of these parameter pairs. HMaster RPC hbase.master.dns.{interface,nameserver} HMaster WebUI N/A RegionServer RPC hbase.regionserver.dns.{interface,nameserver} RegionServer WebUI N/A REST RPC N/A REST WebUI N/A Thrift1 RPC hbase.thrift.dns.{interface,nameserver} Thrift2 RPC hbase.thrift.dns.{interface,nameserver} HBase Client hbase.client.dns.{interface,nameserver} Lastly, there is a bit of complexity with ZK node names, if the HBase-embedded ZK quorum is used. When HBase is instantiating an embedded set of ZK nodes, it is necessary to correlate the node hostnames with the integer indices in the “zoo.cfg” configuration. As part of this, ZK has a hbase.zookeeper.dns.{interface,nameserver} parameter pair. Each ZK node uses these in the usual way to determine its own hostname, which is then fed into the embedded ZK configuration process. Superseded Parameters Superseded by hadoop.security.dns.interface and hadoop.security.dns.nameserver, in Apache Hadoop 2.7.2 and later (back-ported to HDP-2.2.9 and later, and HDP-2.3.4 and later): dfs.datanode.dns.interface dfs.datanode.dns.nameserver Superceded by Namenode HA: dfs.namenode.secondary.http-address and https-address only used in non-HA systems that have two NN masters yet have not configured HA [4] Superceded hdfs/yarn/mapreduce parameters used by older systems: yarn.nodemanager.webapp.bind-host: "0.0.0.0", yarn.nodemanager.webapp.https.bind-host: "0.0.0.0", just use yarn.nodemanager.bind-host: "0.0.0.0" yarn.resourcemanager.admin.bind-host: "0.0.0.0", yarn.resourcemanager.resource-tracker.bind-host: "0.0.0.0", yarn.resourcemanager.scheduler.bind-host: "0.0.0.0", just use yarn.resourcemanager.bind-host: "0.0.0.0" yarn.timeline-service.webapp.bind-host: "0.0.0.0", yarn.timeline-service.webapp.https.bind-host: "0.0.0.0", just use yarn.timeline-service.bind-host: "0.0.0.0" mapreduce.jobhistory.admin.bind-host: "0.0.0.0", mapreduce.jobhistory.webapp.bind-host: "0.0.0.0", mapreduce.jobhistory.webapp.https.bind-host: "0.0.0.0", just use mapreduce.jobhistory.bind-host: "0.0.0.0" Older historical items superceded: dfs.namenode.master.name predecessor of namenode bind-host parameters, and/or rule of “one consistent hostname on all interfaces”. dfs.datanode.hostname predecessor of “one consistent hostname on all interfaces” rule. Note Regarding the blog at: http://hortonworks.com/blog/multihoming-on-hadoop-yarn-clusters/ In “Taking a Test Drive” section of this blog it suggests using EC2 with Public and Private networks. We now know this doesn’t work, since by default in AWS, Public addresses are just NAT’ed to Private. Instead, one must use VPC capability with 2 Private networks. Should fix this document. [1] Unless you really, really know what you’re doing, and want to work through every little nuance of the semantics of all the parameters. About the only situation in which you might get away with using IP addresses for server addressing is if ALL services are being bound to specific single interfaces, not using 0.0.0.0 at all. [2] Having one consistent hostname on all interfaces is not a strict technical requirement of Hadoop. But it is a major simplifying assumption, without which the Hadoop administrator is likely to land in a morass of very complex and mutually conflicting parameter choices. For this reason, Hortonworks only supports using one consistent hostname per server on all interfaces used by Hadoop. [3] In non-virtual Linux systems, the default network interface is typically “eth0” or “eth0:0”, but this conventional naming can be changed in system network configuration or with network virtualization. [4] Not sure why the resources would be committed for a secondary NN, but not set up HA. However, this is still a supported configuration.

mfoley · ‎03-03-2016

Are you using a version of Ambari less than 2.2.1? Please refer to http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Installing_HDP_AMB/content/_determine_stack_compatibility.html , where it says only Ambari-2.2.1 supports HDP-2.4. Therefore, if you are using < 2.2.1, please upgrade Ambari before trying to upgrade the stack. The upgrade doc is at http://docs.hortonworks.com/HDPDocuments/Ambari/Ambari-2.2.1.0/index.html#bk_upgrading_Ambari

mfoley · ‎02-13-2016

Folks often ask about best practice for setting replication factor, evidently wondering if the default value of 3 is supported by factual data. The cool answer is, yes it is! Rob Chansler, an excellent engineering manager and contributor to Hadoop at Yahoo for several years, posted the best material in 2011. The hard-core math is in a spreadsheet attached to Apache Jira https://issues.apache.org/jira/browse/HDFS-2535, "A Model for Data Durability", where he uses reasonable assumptions and experience from Yahoo to calculate the probable rate of data loss events at a single site due to node failures when replication is set to 3, as 0.021 events per century. See "Attachments" : "LosingBlocks.xlsx" in the Jira ticket. Rob also published an article in Usenix titled Data Availability and Durability with the Hadoop Distributed File System, and did a related presentation at Hadoop Summit 2011, Data Integrity and Availability of HDFS (video). Sanjay Radia summarized this information in Hortonworks blog Data Integrity and Availability in Apache Hadoop HDFS, but that article did not focus on the replication factor specifically.

mfoley · ‎02-12-2016

@Benjamin Leonhardi, please see this post: https://community.hortonworks.com/articles/16719/hdfs-data-durability-and-availability-with-replica.html. Please up-vote it if you like it 🙂

mfoley · ‎02-11-2016

(part 2 of 2) Increasing replication to 4 or more only makes sense in DR situations, where you want, say, 2 copies at each of two sites, but then you should be using Falcon or similar to maintain the DR copies, not HDFS replication. On the other hand, replication = 2 definitely leaves your data vulnerable to "it can't happen to me" scenarios -- it only takes the wrong two disk drives to fail at the same time, and at some point they will. So decreasing replication below 3 only makes sense if it is low value data or you're using RAID or other highly reliable storage to protect your data.

mfoley · ‎02-11-2016

(part 1 of 2) Just a quick comment on replication: It is completely separate from issues of block size and file size. It has been established (there's a white paper somewhere) that replication = 3 makes it astronomically unlikely that you'll ever actually lose data within a single site due to hardware failure, as long as HDFS has plenty of datanodes and drives provisioned, dead drives on datanodes get replaced within a day or two of dying, and your site integrity doesn't suffer a catastrophic site loss.

mfoley · ‎02-10-2016

We recently had a customer who was doing stress testing on a very small cluster with just four Datanodes. Their tests included simulated crashes and sudden power-offs. They were getting recurring client errors of hdfs.DFSClient: Exception in createBlockOutputStream with various log messages including: Client: EOFException: Premature EOF: no length prefix available IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try Datanode: ReplicaAlreadyExistsException: Block <number> already exists in state FINALIZED and thus cannot be created ReplicaNotFoundException: Cannot append to a non-existent replica FsDatasetImpl (FsDatasetImpl.java:recoverAppend(1053)) - Recover failed append to BP-<number> ReplicaNotFoundException: Cannot append to a replica with unexpeted generation stamp Namenode: BlockPlacementPolicy - Failed to place enough replicas, still in need of 1 to reach 3 The reason for these errors has to do with: HDFS efforts to recover the replication pipeline if a Datanode fails to complete a write operation, and the three configuration parameters that control this pipeline recovery behavior. The fact that HDFS is stricter about replication in case of an Append or Hflush, than during normal write-once file writing. Note that HBase requires frequent use of file appends. PIPELINE RECOVERY As you know if you’ve read about the design of HDFS, when a block is opened for writing, a pipeline is created of r Datanodes (where r is the replication factor) to receive the replicas of the block. The Client sends the block data to the first Datanode, which sends it to the second, and so on. A block is only considered complete when all Datanodes in the pipeline have reported finishing writes of their replicas. Replication through the pipeline of Datanodes happens in the background, and in parallel with Client writes, so completion of the block normally occurs almost immediately after the Client finishes writing the first replica. However, if any of the Datanodes in the pipeline fail to successfully finish their writes, the Client attempts to recover the replication pipeline by finding a new available Datanode and replacing the failing Datanode. In Hadoop-2.6 and later (HDP-2.2 and later) this behavior is controlled by the following three configuration parameters in hdfs-site.xml. Please read how they work (from hdfs-default.xml): <property> <name>dfs.client.block.write.replace-datanode-on-failure.enable</name> <value>true</value> <description> If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy </description> </property> <property> <name>dfs.client.block.write.replace-datanode-on-failure.policy</name> <value>DEFAULT</value> <description> This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. ALWAYS: always add a new datanode when an existing datanode is removed. NEVER: never add a new datanode. DEFAULT: Let r be the replication number. Let n be the number of existing datanodes. Add a new datanode only if r is greater than or equal to 3 and either (1) floor(r/2) is greater than or equal to n; or (2) r is greater than n and the block is hflushed/appended. </description> </property> <property> <name>dfs.client.block.write.replace-datanode-on-failure.best-effort</name> <value>false</value> <description> This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. Best effort means that the client will try to replace a failed datanode in write pipeline (provided that the policy is satisfied), however, it continues the write operation in case that the datanode replacement also fails. Suppose the datanode replacement fails. false: An exception should be thrown so that the write will fail. true : The write should be resumed with the remaining datandoes. Note that setting this property to true allows writing to a pipeline with a smaller number of datanodes. As a result, it increases the probability of data loss. </description> </property> (The third “best-effort” parameter was not available in older versions of HDFS prior to Hadoop-2.6/HDP-2.2.) We see then that with the “DEFAULT” policy, if the number of available Datanodes falls below 3 (the default replication factor), ordinary writing can continue, but Appends or writes with Hflush will fail, with exceptions like those mentioned above. The reason for Appends being held to a stricter standard of replication is that appends require manipulating generation numbers of the replicas, and maintaining consistency of generation number across replicas requires a great deal of care. Unfortunately, in cases where there are still 3 or 4 working Datanodes, failures can still occur on writes or appends, although it should be infrequent. Basically, a very busy cluster (due to concurrency, specifically both the number of blocks being operated on concurrently and the quantity of large packets being moved over IP channels) may experience write pipeline failures due to socket timeout or other naturally occurring exceptions. When the cluster is large enough, it is easy to find a datanode for replacement. But when a cluster has only 3 or 4 datanodes, it may be hard or impossible to pick a different node, and so the exception is raised. In a small cluster, that is being used for HBase, or otherwise experiencing a lot of Appends and explicit Hflush calls, it may make sense to set 'dfs.client.block.write.replace-datanode-on-failure.best-effort' to true. This enables Appends to continue without experiencing the exceptions noted above, yet allows the system to repair on the fly as best it can. Notice that this is trading the cast-iron reliability of guaranteed 3-way replication for a lower level of surety, to avoid these errors. Hortonworks also suggests that small clusters of less than 10 Datanodes are inappropriate for stress testing, because HDFS reliability depends in part on having plenty of Datanodes to swap between when fatal or intermittent write problems occur. REPLICA RECOVERY It should also be noted that crash testing can of course cause replicas to become corrupted when a write is interrupted. Also, if writing is allowed to continue without completely recovering the replication pipeline, then the number of replicas may be less than required. Both of these problems are self-healing over time, but they do take some time, as follows. Blocks with incorrect number of replicas will be detected every few seconds (under control of the ‘dfs.namenode.replication.interval’ parameter), by a scanning process inside the Namenode, and fixed shortly thereafter. This is different than corrupt replicas. Corrupt replicas will have incorrect checksums, and this protects the Client from using any corrupt replicas they may encounter. When corrupt replicas are detected by attempted client access, they are immediately put in a fix queue. There is also a background process that very slowly sweeps the entire storage system for corrupt replicas; this requires reading the replicas themselves, and typically takes days or longer per cycle on a production cluster. Once in the fix queue, replicas are typically fixed within minutes, unless the system is extraordinarily busy.

mfoley · ‎01-26-2016

Yes, this graph seems reasonable. Very early in the curve, the steep growth represents vocabulary acquisition. After most words in the dominant language have been indexed, the curve flattens out to represent just the data necessary to represent each document's invert index + tags, which is roughly proportional to the size of the document.

Online	Offline
Last Visited	‎08-14-2019 06:45 PM

Member Since	‎10-22-2015 04:49 PM
Last Visited	‎08-14-2019 06:45 PM
Posts	83
Kudos received	79

Cloudera Community

Re: how to check all component on master are stop ...

Re: Network Bonding on Hadoop Cluster with Centos ...

Re: With two HA clusters configured for cross-clus...

Re: file location in HDP

Re: Is it possible to change the default "home" di...

Re: Name Node Reformat after hdfs restore

Re: hadoop none test

Parameters for Multi-Homing

Re: Cant find HDP version 2.4 when want to upgrade...

HDFS Data Durability and Availability with replica...

Re: Best practices between size block , size file ...

Re: Best practices between size block , size file ...

Re: Best practices between size block , size file ...

Write or Append failures in very small Clusters, u...

Re: What kind of growth has index solr?