For a cluster with Namenode HA, I want to know what should be the recommended value for :
The default is false.
I would also like to understand the reasoning behind the value we choose for this property in case of Namenode HA.
The property dfs.client.retry.policy.enabled is important when HA is enabled, as it enables HDFS client retry in case of NameNode failure. So, after enabling HDFS HA, the property should be set to true in hdfs-site.xml.
In case, dfs.client.retry.policy.enabled=false in HA environment, then the Namenode connection attempt is made only once and would fail without attempting to connect to failover node. Snippet from code is as below:
+ /** + * Return the default retry policy used in RPC. + * + * If dfs.client.retry.policy.enabled == false, use TRY_ONCE_THEN_FAIL. + * + * Otherwise, first unwrap ServiceException if possible, and then + * (1) use multipleLinearRandomRetry for + * - SafeModeException, or + * - IOException other than RemoteException, or + * - ServiceException; and + * (2) use TRY_ONCE_THEN_FAIL for + * - non-SafeMode RemoteException, or + * - non-IOException.
should be set to false in cluster with HA enabled.
The reason being in a Namenode High Availability (NN HA) system, when one of the namenodes goes down (NN process stopped), attempts to use hdfs can result in repeating errors and apparent hangs. Running or new jobs that depended on HDFS access will also fail because the failed NN is being talked to.