<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question when need to set Block replication to 1 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/when-need-to-set-Block-replication-to-1/m-p/207474#M74051</link>
    <description>&lt;P&gt;we get the following in spark logs&lt;/P&gt;&lt;PRE&gt;java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage DatanodeInfoWithStorage\
The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036) &lt;/PRE&gt;&lt;P&gt;my ambari cluster include only 3 workers machines and each worker have only one data disk&lt;/P&gt;&lt;P&gt;I search in google and find solution can be about:&lt;/P&gt;&lt;P&gt;Block replication need to be set as 1 instead of 3 ( HDFS )&lt;/P&gt;&lt;P&gt;is it true ?&lt;/P&gt;&lt;P&gt;second - because my worker machine have obnly one data disk is it can be part of the problem ?&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Block replication&lt;/STRONG&gt; = The total number of files in the file system will be what's specified in the dfs.replication factor
setting dfs.replication=1, means will be only one copy of the file in the file system.&lt;/EM&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 31 Jan 2018 00:53:14 GMT</pubDate>
    <dc:creator>mike_bronson7</dc:creator>
    <dc:date>2018-01-31T00:53:14Z</dc:date>
    <item>
      <title>when need to set Block replication to 1</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/when-need-to-set-Block-replication-to-1/m-p/207474#M74051</link>
      <description>&lt;P&gt;we get the following in spark logs&lt;/P&gt;&lt;PRE&gt;java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage DatanodeInfoWithStorage\
The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036) &lt;/PRE&gt;&lt;P&gt;my ambari cluster include only 3 workers machines and each worker have only one data disk&lt;/P&gt;&lt;P&gt;I search in google and find solution can be about:&lt;/P&gt;&lt;P&gt;Block replication need to be set as 1 instead of 3 ( HDFS )&lt;/P&gt;&lt;P&gt;is it true ?&lt;/P&gt;&lt;P&gt;second - because my worker machine have obnly one data disk is it can be part of the problem ?&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Block replication&lt;/STRONG&gt; = The total number of files in the file system will be what's specified in the dfs.replication factor
setting dfs.replication=1, means will be only one copy of the file in the file system.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jan 2018 00:53:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/when-need-to-set-Block-replication-to-1/m-p/207474#M74051</guid>
      <dc:creator>mike_bronson7</dc:creator>
      <dc:date>2018-01-31T00:53:14Z</dc:date>
    </item>
    <item>
      <title>Re: when need to set Block replication to 1</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/when-need-to-set-Block-replication-to-1/m-p/207475#M74052</link>
      <description>&lt;P&gt;1. Block replication if for redundancy of data which ensures data is not lost due to bad disk or node going down. &lt;BR /&gt;2. Replication 1 is set in situation when data can recreated at any point of time, the loss of data is not crucial. Like a job chain, output of one job is consumed by others and ebntually all intermediate data needs to be deleted. The intermediate data can be marked for Replication of 1 ( Still its good to have 2 ) &lt;BR /&gt;3. Replication factor of 1 makes the cluster fault tolerant. &lt;BR /&gt;&lt;BR /&gt;In you case you have 3 worker node, RF of 1 means if a worker is bad, you loose data and the it cant be processed. &lt;BR /&gt;I suggest you to use at  RF=2 if you are concerned about space utilization. &lt;/P&gt;</description>
      <pubDate>Fri, 02 Feb 2018 23:46:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/when-need-to-set-Block-replication-to-1/m-p/207475#M74052</guid>
      <dc:creator>kgautam</dc:creator>
      <dc:date>2018-02-02T23:46:42Z</dc:date>
    </item>
  </channel>
</rss>

