<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: [HDFS] Block replication dfs.replication affect performance in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/HDFS-Block-replication-dfs-replication-affect-performance/m-p/119092#M81875</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/10969/mqureshi.html" nodeid="10969" target="_blank"&gt;@mqureshi&lt;/A&gt;&lt;P&gt;Thank you for your answers.&lt;/P&gt;&lt;P&gt;I want ask one more question.&lt;/P&gt;&lt;P&gt;If I change just only on Ambari UI. So Is it equal with I used &lt;STRONG&gt;setrep &lt;/STRONG&gt;command ? Or I need configure on Ambari UI before use &lt;STRONG&gt;setrep &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="11240-dfs.png" style="width: 1018px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/22947iF80ABB5F3640819E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="11240-dfs.png" alt="11240-dfs.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 19 Aug 2019 10:41:43 GMT</pubDate>
    <dc:creator>hoangletrung</dc:creator>
    <dc:date>2019-08-19T10:41:43Z</dc:date>
    <item>
      <title>[HDFS] Block replication dfs.replication affect performance</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HDFS-Block-replication-dfs-replication-affect-performance/m-p/119090#M81873</link>
      <description>&lt;P&gt;I have two question about dfs.replication parameter:&lt;/P&gt;&lt;P&gt;1. I know default of replication block is &lt;STRONG&gt;3. &lt;/STRONG&gt;But when I configure &lt;STRONG&gt;dfs.replication=1, &lt;/STRONG&gt;Do it affected to cluster performance.&lt;/P&gt;&lt;P&gt;2. I have a lot of data with configure &lt;STRONG&gt;dfs.replication=1, &lt;/STRONG&gt;and now I change configure to &lt;STRONG&gt;dfs.replication= 3. &lt;/STRONG&gt;So my data will auto replicate or I have to build my data again to replication running. I need to be sure because my data is very important.&lt;/P&gt;&lt;P&gt;P/S: any best practice for dfs.replication configure.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 12:41:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HDFS-Block-replication-dfs-replication-affect-performance/m-p/119090#M81873</guid>
      <dc:creator>hoangletrung</dc:creator>
      <dc:date>2017-01-09T12:41:40Z</dc:date>
    </item>
    <item>
      <title>Re: [HDFS] Block replication dfs.replication affect performance</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HDFS-Block-replication-dfs-replication-affect-performance/m-p/119091#M81874</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13042/hoangletrung.html" nodeid="13042"&gt;@Hoang Le&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;1. I know default of replication block is &lt;STRONG&gt;3. &lt;/STRONG&gt;But when I configure &lt;STRONG&gt;dfs.replication=1, &lt;/STRONG&gt;Do it affected to cluster performance.&lt;/P&gt;&lt;P&gt;Since you are not replicating, your writes will be faster at the expense of significant risk of data loss as well as read performance. Your reads can be slow because your data might happen to be on a node experiencing issues with no other block available as well as job failure in case of just one node failure.&lt;/P&gt;&lt;P&gt;2. I have a lot of data with configure &lt;STRONG&gt;dfs.replication=1, &lt;/STRONG&gt;and now I change configure to &lt;STRONG&gt;dfs.replication= 3. &lt;/STRONG&gt;So my data will auto replicate or I have to build my data again to replication running. I need to be sure because my data is very important.&lt;/P&gt;&lt;P&gt;Use setrep to change replication factor for existing files. It will replicate existing data (you will have to provide the path). &lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;hadoop fs -setrep [-R] [-w] &amp;lt;numReplicas&amp;gt; &amp;lt;path&amp;gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;hadoop fs -setrep -w 3 /user/hadoop/dir1&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;The -R flag is accepted for backwards compatibility. It has no effect.&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;The -w flag requests that the command wait for the replication to complete. This can potentially take a very long time.&lt;/LI&gt;&lt;LI&gt;Returns 0 on success and -1 on error.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;P/S: any best practice for dfs.replication configure.&lt;/P&gt;&lt;P&gt;Always use default replication factor of 3. It provides data resiliency as well as redundancy in case of node failures. It also helps read performance. In rare cases, you can increase replication factor to help even more data distribution to make reads faster.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 13:53:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HDFS-Block-replication-dfs-replication-affect-performance/m-p/119091#M81874</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2017-01-09T13:53:12Z</dc:date>
    </item>
    <item>
      <title>Re: [HDFS] Block replication dfs.replication affect performance</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HDFS-Block-replication-dfs-replication-affect-performance/m-p/119092#M81875</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/10969/mqureshi.html" nodeid="10969" target="_blank"&gt;@mqureshi&lt;/A&gt;&lt;P&gt;Thank you for your answers.&lt;/P&gt;&lt;P&gt;I want ask one more question.&lt;/P&gt;&lt;P&gt;If I change just only on Ambari UI. So Is it equal with I used &lt;STRONG&gt;setrep &lt;/STRONG&gt;command ? Or I need configure on Ambari UI before use &lt;STRONG&gt;setrep &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="11240-dfs.png" style="width: 1018px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/22947iF80ABB5F3640819E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="11240-dfs.png" alt="11240-dfs.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2019 10:41:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HDFS-Block-replication-dfs-replication-affect-performance/m-p/119092#M81875</guid>
      <dc:creator>hoangletrung</dc:creator>
      <dc:date>2019-08-19T10:41:43Z</dc:date>
    </item>
    <item>
      <title>Re: [HDFS] Block replication dfs.replication affect performance</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HDFS-Block-replication-dfs-replication-affect-performance/m-p/119093#M81876</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13042/hoangletrung.html" nodeid="13042"&gt;@Hoang Le&lt;/A&gt; &lt;/P&gt;&lt;P&gt;No, Ambari UI will set it for future files that you will create. It will not run setrep command for you. That you will have to run from shell as described above.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 14:10:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HDFS-Block-replication-dfs-replication-affect-performance/m-p/119093#M81876</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2017-01-09T14:10:20Z</dc:date>
    </item>
  </channel>
</rss>

