Support Questions

Find answers, ask questions, and share your expertise

Where can I set dfs.client.block.write.replace-datanode-on-failure.enable?

avatar
Contributor

I have a small 3 node cluster and am experiencing total failure when running Reduce jobs. I searched through syslog and found errors pointing to this variable:

 

dfs.client.block.write.replace-datanode-on-failure.enable

 

If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy

 

How can I set this variable from the CDH4 Cloudera Manager interface? Or do I need to manually edit an XML file?

1 ACCEPTED SOLUTION

avatar
Cloudera Employee

Hey Ben,

 

In a CM-managed cluster, CM will take care of managing and deploying configurations for you (including setting custom options like this). Manually editing the config files is brittle, since CM might push new configs ontop of it. Anyway, to add custom options, search for "safety valve" in the configuration editor, and you can paste in xml directly.

 

Note also that these two properties are specific to the client, not the datanode or NN, so you probably want to be dropping this into the "HDFS Client Configuration Safety Valve for hdfs-site.xml" box.

 

Best,

Andrew

View solution in original post

9 REPLIES 9

avatar
Contributor

Answer:

 

For my cluster, I directly modified the /etc/hadoop/conf/hdfs-site.xml file on all 4 nodes, including namenode and datanodes.

 

I am able to locate other dfs.client variables in the Cloudera Manager:

host:7180/cmf/services/19/config

 

But the variable that I added manually does not show up in Cloudera Manager as far as I can see.

 

Another variable to set in conjunction with this is:

dfs.client.block.write.replace-datanode-on-failure.policy

 

See http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

avatar
Cloudera Employee

Hey Ben,

 

In a CM-managed cluster, CM will take care of managing and deploying configurations for you (including setting custom options like this). Manually editing the config files is brittle, since CM might push new configs ontop of it. Anyway, to add custom options, search for "safety valve" in the configuration editor, and you can paste in xml directly.

 

Note also that these two properties are specific to the client, not the datanode or NN, so you probably want to be dropping this into the "HDFS Client Configuration Safety Valve for hdfs-site.xml" box.

 

Best,

Andrew

avatar
Contributor

Andrew, thanks, I am currently working on purchasing Cloudera Enterprise licenses so we can get official support. I will look for the area you are indicating.

 

For anyone else reading the thread, the reason we needed to set these variables is that MapReduce jobs (via Pig scripting) were failing at the Reduce phase. Based on system logs, I was able to trace it to these variables. In the Hadoop documentation, you can see that a 3 node cluster is considered "extremely" small and that is exactly what we are running - 1 namenode and 3 datanodes. Also, replication is set to 3, meaning all datanodes must be operational in order for HDFS to be healthy. In retrospect, with such a small cluster we would decrease replication to 2.

 

To be clear, we had to set both variables even though the documentation indicates you only need to set the first "enable" to "never". In fact we found that our problem was not fixed until we also set "policy" to "never".

 

dfs.client.block.write.replace-datanode-on-failure.enabletrueIf there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy
dfs.client.block.write.replace-datanode-on-failure.policyDEFAULTThis property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. ALWAYS: always add a new datanode when an existing datanode is removed. NEVER: never add a new datanode. DEFAULT: Let r be the replication number. Let n be the number of existing datanodes. Add a new datanode only if r is greater than or equal to 3 and either (1) floor(r/2) is greater than or equal to n; or (2) r is greater than n and the block is hflushed/appended.

avatar
Contributor
FYI for future readers, this can be found under Services > Service hdfs1 > Configuration > View and Edit > search for "safety valve". You will find this variable in the results "HDFS Service Configuration Safety Valve for hdfs-site.xml".

avatar
Expert Contributor

Hi, what property did you modify, it sohuld be cluster-wide or related only to client?

avatar
New Contributor

Hi, i just inserted some xml into the results of  "HDFS Service Configuration Safety Valve for hdfs-site.xml" and deployed client configuration by CM UI, but it did not seem to work because when I opened /etc/hadoop/conf.cloudera.hdfs1/hdfs-site.xml, the file was updated but what i inserted was not  found.   So did not just add these parameters manually and it appeared in the "HDFS Service Configuration Safety Valve for hdfs-site.xml"?

avatar
Mentor

To add configuration snippets for a client config, the right field to use is the "HDFS Client Configuration Safety Valve for hdfs-site.xml", not the "Service" one, which only applies to daemons.

avatar
New Contributor

hi all,

 

Can any one help me  where can i find this set dfs.client.block.write.replace-datanode-on-failure.enable parameter in Cloudera manger.

 

I have serached in hdfs-site.xml. but i could not find the theses values..

avatar
Mentor
Please follow the entire discussion above - the parameter is an advanced one and has no direct field. You'll need to use the safety valve to apply it by using the property name directly.

P.s. It is better etiquette to open a new topic than bump ancient ones.