Support Questions

Find answers, ask questions, and share your expertise

hdfs client takes 20s to failover to alternate namenode

avatar
Contributor

I have a cluster with two namenodes configured in a for HA. For failover testing, we purposely turned off namenode 1. However, when trying to check an HDFS file size from server 2, the HDFS client call still attempts to connect to namenode 1 first. This causes a 20s delay while it times out before it tries namenode2. I've tried setting the dfs.ha.namenodes.xxx property to change the search order but without success. It always trys namenode1 first and then, after 20s goes to namenode 2. This is causing unacceptable delays in our system which needs faster response times than having to wait 20s to connect to the correct namenode. Does anyone how I may rectify this problem?

Thanks, David

7 REPLIES 7

avatar
Master Mentor

@David Robison

What is the value for the following HDFS property "dfs.client.retry.policy.enabled" ?

# su - hdfs
# hdfs getconf -confKey dfs.client.retry.policy.enabled

.

DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies to ensure fast failover. Otherwise, dfsclient retries the NN which is no longer active and delays the failover.

avatar
Contributor

dfs.client.retry.policy.enabled is set to false

avatar
Master Mentor

@David Robison

The connection failures result in retrying alternate NameNodes up to a total of 15 times with an exponentially increasing delay of up to 15 seconds. The "dfs.client.failover.max.attempts" property defines the failover attempts (by default set to 15) and the maximum wait time between attempts is "15 seconds by default" which is controlled by the property "dfs.client.failover.sleep.max.millis".

dfs.client.failover.sleep.max.millis : This option specifies the maximum value to wait between failovers.

avatar
Master Mentor

@David Robison

Are you still seeing the delay of 15 seconds in failover?

avatar
Contributor

Yes, actually it's 20s. I believe that it is the default from the ipc.client.connect.timeout default. I am trying to see if I can set it to 2s. The main problem seems to be that for every FileSystem object I create it wants to always try my server 1 first which is down. It doesn't seem to remember that it was down the last time it tried and continues to keep retrying it for each new FileSystem object. I am also trying caching my own FileSystem object so that, if I reuse an object that has already failed over to the second server, I won't incur the same 2s delay in first trying to connect to the failed server.

avatar
Contributor

Hi, I'm facing completely same issue with HDP2.6.2.

HDFS client have to wait about 20 seconds when the 1st NameNode is powered off.

(Actually, we had this issue then the 1st NameNode had kernel hang (kernel panic).)

Do you find good solution or workaround???

If so, please share it. Any information will help us !

avatar

Hi,

 

Have you find a solution ? I have the same problem with my production cluster.

HDFS HA is implemented from several years without problem.

 

But recently, we realized that a hdfs client have to wait 20 secondes when the server hosting the nn1 is shutdown. Exemple when I set the debug mode :

19/08/29 11:03:05 DEBUG ipc.Client: Connecting to XXXXX/XXXXX:8020

19/08/29 11:03:23 DEBUG ipc.Client: Failed to connect to server: XXXXX/XXXXX:8020: try once and fail.

java.net.NoRouteToHostException: No route to host

 

Few informations : 

Hadoop version : 2.7.3

dfs.client.retry.policy.enabled : false

dfs.client.failover.sleep.max.millis : 15000

ipc.client.connect.timeout : 20000

 

Thanks for your help !