Member since
07-30-2019
111
Posts
186
Kudos Received
35
Solutions
02-27-2018
10:20 PM
2 Kudos
Building Apache Tez with Apache Hadoop 2.8.0 or later fails due to client/server jar separation in Hadoop [1]. The build fails with the following error: [ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /src/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[48,30] cannot find symbol
symbol: class DistributedFileSystem
location: package org.apache.hadoop.hdfs
[ERROR] /src/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[680,50] cannot find symbol
symbol: class DistributedFileSystem
location: class org.apache.tez.client.TestTezClientUtils
[ERROR] /src/tez/ecosystem/tez/tez-api/src/test/java/org/apache/tez/common/TestTezCommonUtils.java:[62,42] cannot access org.apache.hadoop.hdfs.DistributedFileSystem To get Tez to compile successfully, you will need to use the new hadoop28 profile introduced by TEZ-3690 [2]. E.g. here is how you compile Tez with Apache Hadoop 3.0.0: mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Phadoop28 -Dhadoop.version=3.0.0 References: 1. HDFS-6200: Create a separate jar for hdfs-client 2. TEZ-3690: Tez on hadoop 3 build failed due to hdfs client/server jar separation.
... View more
Labels:
06-22-2017
07:46 PM
7 Kudos
The HDFS NameNode ensures that each block is sufficiently replicated. When it detects the loss of a DataNode, it instructs remaining nodes to maintain adequate replication by creating additional block replicas.
For each lost replica, the NameNode picks a (source, destination) pair where the source is an available DataNode with another replica of the block and the destination is the target for the new replica. The re-replication work can be massively parallelized in large clusters since the replica distribution is randomized.
In this article, we estimate a lower bound for the recovery time. Simplifying Assumptions
The maximum IO bandwidth of each disk is 100MB/s (reads + writes). This is true for the vast majority of clusters that use spinning disks. The aggregate IO capacity of the cluster is limited by disk and not the network. This is not always true but helps us establish lower bounds without discussing network topologies. Block replicas are uniformly distributed across the cluster and disk usage is uniform. True if the HDFS balancer was run recently.
Theoretical Lower Bound
Let's assume the cluster has n nodes. Each each node has p disks, and the usage of each disk is c TeraBytes. The data usage of each node is thus (p ⋅ c) TB.
The amount of data data transfer needed for recovery is twice the capacity of the lost DataNode as each replica must be read once from a source disk and written once to the target disk.
Data transfer during recovery = 2 ⋅ (Node Capacity)
= (2 ⋅ p ⋅ c) TB
= (2 ⋅ p ⋅ c ⋅ 1,000,000) MB
The re-replication rate is the limited by the available aggregate IO bandwidth in the cluster: Cluster aggregate IO bandwidth = (Disk IO bandwidth) ⋅ (Number of disks)
= (100 ⋅ n ⋅ p) MB/s Thus Minimum Recovery Time = (Data transfer during recovery) / (Cluster aggregate IO bandwidth)
= (2 ⋅ p ⋅ c ⋅ 1,000,000) / (100 ⋅ n ⋅ p)
= (20,000 ⋅ c/n) seconds.
where: c = Mean usage of each disk in TB.
n = Number of DataNodes in the cluster. This is the absolute best case with no other load, no network bandwidth limits, and a perfectly efficient scheduler.
E.g. In a 100 node cluster where each disk has 4TB of data, recovery from the loss of a DataNode must take at least (20,000 ⋅ 4) / 100 = 800 seconds or approximately 13 minutes.
Clearly, the cluster size bounds the recovery time. Disk capacities being equal, a 1000 node cluster can recover 10x faster than a 100 node cluster. A More Practical Lower Bound
The theoretical lower bound assumes that block re-replications can be instantaneously scheduled across the cluster. It also assumes that all cluster IO capacity is available for re-replication whereas in practice application reads and writes also consume IO capacity. The NameNode schedules 2 outbound replication streams per DataNode, per heartbeat interval to throttle re-replication traffic. This throttle allows DataNodes to remain responsive to applications. The throttle can be adjusted via the configuration setting dfs.namenode.replication.max-streams. Let's call this m and the heartbeat interval h.
Also let's assume the mean block size in the cluster is b MB. Then: Re-replication Rate = Blocks Replicated cluster-wide per heartbeat interval
= (n ⋅ m/h) Blocks/s
The total number of blocks to be re-replicated is the capacity of the lost node divided by the mean block size. Number of Blocks Lost = (p ⋅ c) TB / b MB
= (p ⋅ c ⋅ 1,000,000/b).
Thus:
Recovery Time = (Number of Blocks Lost) / (Re-replication Rate)
= (p ⋅ c ⋅ 1,000,000) / (b ⋅ n ⋅ m/h)
= (p ⋅ c ⋅ h ⋅ 1,000,000) / (b ⋅ n ⋅ m) seconds.
where: p = Number of disks per node.
c = Mean usage of each disk in TB.
h = Heartbeat interval (default = 3 seconds).
b = Mean block size in MB.
n = Number of DataNodes in the cluster.
m = dfs.namenode.replication.max-streams (default = 2)
Simplifying by plugging in the defaults for h and m, we get
Minimum Recovery Time (seconds) = (p ⋅ c ⋅ 1,500,000) / (b ⋅ n)
E.g. in the same cluster, assuming the mean block size is 128MB and each node has 8 disks, the practical lower bound on recovery time will be 3,750 seconds or ~1 hour. Reducing the Recovery Time
The recovery time can be reduced by:
Increasing dfs.namenode.replication.max-streams. However, setting this value too high can affect cluster performance. Note that increasing this value beyond 4 must be evaluated carefully and also requires changing the safeguard upper limit via dfs.namenode.replication.max-streams-hard-limit.
Using more nodes with smaller disks. Total cluster capacity remaining the same, a cluster with more nodes and smaller disks will recover faster.
Avoiding predominantly small blocks.
... View more
Labels:
07-26-2016
12:16 AM
1 Kudo
You may run into slow Hadoop service start on your OS X development laptop. You can check this by opening up your service logs and looking for large (5-10 second) gaps between successive log entries at startup. Diagnosis It often manifests as test failures for MiniDFSCluster-based tests that use short timeouts (<10 seconds). Here is an example from a NameNode log file with a 5 second stall at startup. 2016-07-25 14:57:37,982 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-07-25 14:57:43,060 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). Another 5 second stall during NameNode startup. 2016-07-25 14:57:48,790 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: true
2016-07-25 14:57:53,914 INFO org.apache.hadoop.util.GSet: Computing capacity for map INodeMap Resolution If you see this behavior you are likely running into an OS X bug. The fix is to put all your entries for localhost on one line as described in this StackOverflow answer. i.e. Make sure your /etc/hosts file has something like this: # Replace myhostname with the hostname of your laptop.
#
127.0.0.1 localhost myhostname myhostname.local myhostname.Home Instead of this: 127.0.0.1 localhost myhostname.local
127.0.0.1 myhostname myhostname.Home
Root Cause The root cause of this problem appears to be a long delay when looking up the local host name with InetAddress.getLocalHost. The following code is a minimal repro of this problem on affected systems. import java.net.*;
class Lookup {
public static void main(String[] args) throws Exception {
System.out.println(InetAddress.getLocalHost().getCanonicalHostName());
}
} This program can take over 5 seconds to execute on an affected machine. Verified on OS X 10.10.5 with Oracle JDK 1.8.0_91 and 1.7.0_79.
... View more
Labels:
07-08-2016
05:25 PM
Nice series @szetszwo! Thanks for writing this up. It will be a very useful resource for administrators.
... View more
07-07-2016
04:13 AM
10 Kudos
Introduction
This article continues where part 2 left off. It describes how to enable two new features that make the HDFS NameNode more responsive to high RPC request loads.
Audience
This article is for Apache Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari you should know how to manage services and configurations with Ambari. It is assumed that you have read part 1 and part 2 of the article.
RPC Congestion Control
Warning: You must first enable the service RPC port (described in part 1) and restart services so the service RPC setting is effective. Failure to do so will break DataNode-NameNode communication.
RPC Congestion Control is a relatively new feature added by the Apache Hadoop Community to help Hadoop Services respond more predictably under high load (see Apache Hadoop Jira HADOOP-10597). In part 2, we discussed how RPC queue overflow can cause request timeouts, and eventually job failures.
This problem can be mitigated if the NameNode sends an explicit signal back to the client when its RPC queue is full. Instead of waiting for a request that may never complete, the client throttles itself by resubmitting the request with an exponentially increasing delay. If you are familiar with the Transmission Control Protocol, this is a similar to how senders react when they detect network congestion.
The feature can be enabled with the following setting in your core-site.xml file. Replace 8020 with your NameNode RPC port number if it is different. Do not enable this setting for the Service RPC port or the DataNode lifeline port.
<property>
<name>ipc.8020.backoff.enable</name>
<value>true</value>
</property>
You should not enable this setting unless you are running one of HDP 2.3.6+ or HDP 2.4.2+; or if a Hortonworks engineer has recommended you enable it after examining your cluster.
RPC FairCallQueue
Warning: You must first enable the service RPC port (described in part 1) and restart services so the service RPC setting is effective. Failure to do so will break DataNode-NameNode communication.
The RPC FairCallQueue replaces the single RPC queue with multiple prioritized queues (see Apache Hadoop Jira HADOOP-10282). The RPC server maintains a history of recent requests grouped by user. It places incoming requests into an appropriate queue based on the user's history. RPC handler threads will dequeue requests from higher priority queues with a higher probability.
FairCallQueue complements RPC congestion control very well and works best when you enable both features together. FairCallQueue can be enabled with the following setting in your core-site.xml. Replace 8020 with your NameNode RPC port if it is different. Do not enable this setting for the Service RPC port or the DataNode lifeline port.
<property>
<name>ipc.8020.callqueue.impl</name>
<value>org.apache.hadoop.ipc.FairCallQueue</value>
</property>
Conclusion
That is it for part 3. In part 4 of this article we look at how to avoid a few common NameNode performance pitfalls.
... View more
Labels:
07-07-2016
04:13 AM
17 Kudos
Introduction
This article is part of a series (see part 1, part 2, part 3 and part 5). Here we look at how to avoid some common NameNode performance issues.
Audience
This article is for Apache Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari or Cloudera Manager, you should know how to manage services and configurations.
Avoid Disk Throughput Issues
The performance of the HDFS NameNode depends on being able to flush its edit logs to disk relatively quickly. Any delay in writing edit logs will hold up valuable RPC handler threads and can have cascading effects on the performance of your entire Hadoop cluster.
You should use a dedicated hard disk i.e. dedicated spindle for the edit logs to keep edits flowing to the disk smoothly. The location of the edit log is configured using the dfs.namenode.edits.dir setting in hdfs-site.xml. If dfs.namenode.edits.dir is not configured, it defaults to the same value as dfs.namenode.name.dir.
Example:
<property>
<name>dfs.namenode.name.dir</name>
<value>/mnt/disk1,/mnt/disk2</value>
</property>
If there are multiple comma-separated directories for dfs.namenode.name.dir, the NameNode will replicate its metadata to each of these directories. In the example above, it is recommended that /mnt/disk1 and /mnt/disk2 refer to dedicated spindles.
As a corollary, if you are using NameNode HA with QJM then the Journal Nodes should also use dedicated spindles for writing edit log transactions. The JournalNode edits directory can be configured via dfs.journalnode.edits.dir in hdfs-site.xml.
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/mnt/disk3</value>
</property>
This leads us to the next pitfall.
Don't let Reads become Writes
The setting dfs.namenode.accesstime.precision controls how often the NameNode will update the last accessed time for each file. It is specified in milliseconds. If this value is too low, then the NameNode is forced to write an edit log transaction to update the file's last access time for each read request and its performance will suffer.
The default value of this setting is a reasonable 3600000 milliseconds (1 hour). We recommend going one step further and setting it to zero so that last access time updates are turned off. Add the following to your hdfs-site.xml.
<property>
<name>dfs.namenode.accesstime.precision</name>
<value>0</value>
</property>
Monitor Group lookup Performance
The NameNode frequently performs group lookups for users submitting requests. This is part of the normal access control mechanism. Group lookups may involve contacting an LDAP server. If the LDAP server is slow to process requests, it can hold up NameNode RPC handler threads and affect cluster performance. When the NameNode suffers from this problem it logs a warning that looks like this:
Potential performance problem: getGroups(user=bob) took 23861 milliseconds.
The default warning threshold is 5 seconds. If you see this log message you should investigate the performance of your LDAP server right away. There are a few workarounds if it is not possible to fix the LDAP server performance immediately.
Increase the group cache timeout: The NameNode caches the results of group lookups to avoid making frequent lookups. The cache duration is set via hadoop.security.groups.cache.secs in your core-site.xml file. The default value of 300 (5 minutes) is too low. We have seen this value set as high as 7200 (2 hours) in busy clusters to avoid frequent group lookups.
Increase the negative cache timeout: For users with no group mapping, the NameNode maintains a short negative cache. The default negative cache duration is 30 seconds and maybe too low. You can increase this with the hadoop.security.groups.negative-cache.secs setting in core-site.xml. This setting is useful if you see frequent messages like 'No groups available for user 'alice' in the NameNode logs.
Add a static mapping override for specific users: A static mapping is an in-memory mapping added via configuration that eliminates the need to lookup groups via any other mechanism. It can be a useful last resort to work around LDAP server performance issues that cannot be easily addressed by the cluster administrator. An example static mapping for two users 'alice' and 'bob' looks like this:
<property>
<name>hadoop.user.group.static.mapping.overrides</name>
<value>alice=staff,hadoop;bob=students,hadoop;</value>
</property>
The HCC article Hadoop and LDAP: Usage, Load Patterns and Tuning has a lot more detail on debugging and optimizing Hadoop LDAP performance.
Monitor NameNode Logs for Excessive Spew
Sometimes the NameNode can generate excessive logging spew which severely impacts its performance. Rendering logging strings is an expensive operation that consumes CPU cycles. It also generates a lot of small short-lived objects (garbage) causing excessive Garbage Collection.
The Apache Hadoop community has made a number of fixes to reduce the verbosity of NameNode logs and these fixes are available in recent HDP maintenance releases. Example: HDFS-9434, HADOOP-12903, HDFS-9941, HDFS-9906 and many more.
Depending on your specific version of Apache Hadoop or HDP you may run into one or more of these problems. Also, new problems can be exposed by changing workloads. It is a good idea to monitor your NameNode logs periodically for excessive log spew and report it either by contacting the Hortonworks support team or filing a bug in the Apache Hadoop Jira.
Conclusion
That is it for Part 4. In Part 5 of this series, we will explore how to optimize logging and block reports.
... View more
Labels:
07-07-2016
04:13 AM
16 Kudos
Introduction
This article continues where part 1 left off. It describes a few more configuration settings that can be enabled in CDP, HDP, CDH, or Apache Hadoop clusters to help the NameNode scale better.
Audience
This article is for Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari or Cloudera Manager you should know how to manage services and configurations. It is assumed that you have read part 1 of the article.
RPC Handler Count
The Hadoop RPC server consists of a single RPC queue per port and multiple handler (worker) threads that dequeue and process requests. If the number of handlers is insufficient, then the RPC queue starts building up and eventually overflows. You may start seeing task failures and eventually job failures and unhappy users.
It is recommended that the RPC handler count be set to 20 * log2(Cluster Size) with an upper limit of 200.
e.g. for a 250 node cluster you should initialize this to 20 * log2(250) = 160. The RPC handler count can be configured with the following setting in hdfs-site.xml.
<property>
<name>dfs.namenode.handler.count</name>
<value>160</value>
</property>
This heuristic is from the excellent Hadoop Operations book. If you are using Ambari to manage your cluster this setting can be changed via a slider in the Ambari Server Web UI. If you're using Cloudera Manager you can search for property name "dfs.namenode.service.handler.count" under the HDFS configuration page and adjust the value.
Service RPC Handler Count
Pre-requisite: If you have not enabled the Service RPC port already, please do so first as described here.
There is no precise calculation for the Service RPC handler count however the default value of 10 is too low for most production clusters. We have often seen this initialized to 50% of the dfs.namenode.handler.count in busy clusters and this value works well in practice.
e.g. for the same 250 node cluster you would initialize the service RPC handler count with the following setting in hdfs-site.xml.
<property>
<name>dfs.namenode.service.handler.count</name>
<value>80</value>
</property>
DataNode Lifeline Protocol
The Lifeline protocol is a feature recently added by the Apache Hadoop Community (see Apache HDFS Jira HDFS-9239). It introduces a new lightweight RPC message that is used by the DataNodes to report their health to the NameNode. It was developed in response to problems seen in some overloaded clusters where the NameNode was too busy to process heartbeats and spuriously marked DataNodes as dead.
For a non-HA cluster, the feature can be enabled with the following setting in hdfs-site.xml (replace mynamenode.example.com with the hostname or IP address of your NameNode). The port number can be different too.
<property>
<name>dfs.namenode.lifeline.rpc-address</name>
<value>mynamenode.example.com:8050</value>
</property>
For an HA cluster, the lifeline RPC port can be enabled with settings like the following, replacing mycluster, nn1 and nn2 appropriately.
<property>
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn1</name>
<value>mynamenode1.example.com:8050</value>
</property>
<property>
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn2</name>
<value>mynamenode2.example.com:8050</value>
</property>
Additional lifeline protocol settings are documented in the HDFS-9239 release note but these can be left at their default values for most clusters.
Note: Changing the lifeline protocol settings requires a restart of the NameNodes, DataNodes and ZooKeeper Failover Controllers to take full effect. If you have NameNode HA setup, you can restart the NameNodes one at a time followed by a rolling restart of the remaining components to avoid cluster downtime.
Conclusion
That is it for Part 2. In Part 3 of this article, we will explore how to enable two new HDFS features to help your NameNode scale better.
... View more
Labels:
07-07-2016
04:13 AM
29 Kudos
Introduction
With HDFS HA, the NameNode is no longer a single point of failure in a Hadoop cluster. However the performance of a single NameNode can often limit the performance of jobs across the cluster. The Apache Hadoop community has made a number of NameNode scalability improvements. This series of articles (also see part 2, part 3, part 4 and part 5) explains how you can make configuration changes to help your NameNode scale better and also improve its diagnosability.
Audience
This article is for Hadoop administrators who are familiar with HDFS and its components. If you are using Ambari or Cloudera Manager you should know how to manage services and configurations.
HDFS Audit Logs
The HDFS audit log is a separate log file that contains one entry for each user request. The audit log is a key resource for offline analysis of application traffic to find out users/jobs that generate excessive load on the NameNode. Careful analysis of the HDFS audit logs can also identify application bugs/misconfiguration.
Each audit log entry looks like the following:
2015-11-16 21:00:00,109 INFO FSNamesystem.audit: allowed=true ugi=bob (auth:SIMPLE)
ip=/000.000.0.0 cmd=listStatussrc=/app-logs/pcd_batch/application_1431545431771/ dst=null perm=null
Audit logging can be enabled via the Hadoop log4j.properties file (typically found in /etc/hadoop/conf, or under the "process" directory when using CM) with the following settings:
# hdfs audit logging
hdfs.audit.logger=INFO,console
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
log4j.appender.DRFAAUDIT=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.DRFAAUDIT.DatePattern=.yyyy-MM-dd
Also, check the NameNode startup options in hadoop-env.sh to ensure there is no conflicting override for the hdfs.audit.logger setting. The following is an example of a correct HADOOP_NAMENODE_OPTS setting.
export HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=8g -XX:MaxNewSize=8g -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms64g -Xmx64g -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}"
Note: If you enable HDFS audit logs, you should also enable Async Audit Logging as described in the next section.
Async Audit Logging
Using an async log4j appender for the audit logs significantly improves the performance of the NameNode. This setting is critical for the NameNode to scale beyond 10,000 requests/second. Add the following to your hdfs-site.xml.
<property>
<name>dfs.namenode.audit.log.async</name>
<value>true</value>
</property>
If you are managing your cluster with Ambari, this setting is already enabled by default. If you're using CDH, you'll want to add the property to "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml". For a CDP cluster, this property is exposed in the UI and enabled by default.
Service RPC port
The service RPC port gives the DataNodes a dedicated port to report their status via block reports and heartbeats. The port is also used by Zookeeper Failover Controllers for periodic health checks by the automatic failover logic. The port is never used by client applications hence it reduces RPC queue contention between client requests and DataNode messages.
For a non-HA cluster, the service RPC port can be enabled with the following setting in hdfs-site.xml (replacing mynamenode.example.com with the hostname or IP address of your NameNode). The port number can be something other than 8040.
<property>
<name>dfs.namenode.servicerpc-address</name>
<value>mynamenode.example.com:8040</value>
</property>
For an HA cluster, the service RPC port can be enabled with settings like the following, replacing mycluster, nn1 and nn2 appropriately.
<property>
<name>dfs.namenode.servicerpc-address.mycluster.nn1</name>
<value>mynamenode1.example.com:8040</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.mycluster.nn2</name>
<value>mynamenode2.example.com:8040</value>
</property>
If you have enabled Automatic Failover using ZooKeeper Failover Controllers, an additional step is required to reset the ZKFC state in ZooKeeper. Stop both the ZKFC processes and run the following command as the hdfs superuser.
hdfs zkfc –formatZK
Note: Changing the service RPC port settings requires a restart of the NameNodes, DataNodes and ZooKeeper Failover Controllers to take full effect. If you have NameNode HA setup, you can restart the NameNodes one at a time followed by a rolling restart of the remaining components to avoid cluster downtime.
NameNode JVM Garbage Collection
Garbage Collection is important enough to merit its own article. Please refer to NameNode Garbage Collection Configuration: Best Practices and Rationale.
Conclusion
That's it for Part 1. In part 2, part 3, part 4 and part 5 we will look at a number of additional configuration settings to help your NameNode scale better.
... View more
Labels:
06-09-2016
06:31 PM
2 Kudos
Good writeup @Mingliang Liu. In addition to what @Chris Nauroth said, I also add -Dmaven.site.skip=true. mvn clean package -Pdist,native -Dtar -DskipTests=true -Dmaven.site.skip=true -Dmaven.javadoc.skip=true
... View more
10-28-2015
05:22 PM
4 Kudos
Configuring a separate service RPC port can improve the responsiveness of the NameNode by allowing DataNode and client requests to be processed via separate RPC queues. Adding a service RPC port to an HA cluster with automatic failover via ZKFCs requires a workaround step. The steps to configure a service RPC port are as follows: Add the following settings to hdfs-site.xml. <property>
<name>dfs.namenode.servicerpc-address.mycluster.nn1</name>
<value>nn1.example.com:8040</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.mycluster.nn2</name>
<value>nn2.example.com:8040</value>
</property> 2. Restart NameNodes. 3. Stop the ZKFC processes on both NameNodes 4. Run the following command to reset the ZKFC state in ZooKeeper hdfs zkfc -formatZK Without this step you will see the following exception after ZKFC restart. java.lang.RuntimeException: Mismatched address stored in ZK for NameNode 5. Restart the ZKFCs.
... View more
Labels: