<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Disadvantages of replication factor 1 on 200GB of data per day in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Disadvantages-of-replication-factor-1-on-200GB-of-data-per/m-p/226815#M63749</link>
    <description>&lt;P&gt;Well there are many disadvantages of using replication factor 1 and we strongly do not recommend it for below reasons:&lt;/P&gt;&lt;P&gt;1. Data loss --&amp;gt; One or more datanode or disk failure will result in data loss.&lt;/P&gt;&lt;P&gt;2. Performance issues --&amp;gt; Having replication factor of more than 1 results in more parallelization. &lt;/P&gt;&lt;P&gt;3. Handling Failure --&amp;gt; With replication factor &amp;gt; 1, one or more Datanode doesn't result in job failure.&lt;/P&gt;</description>
    <pubDate>Wed, 28 Jun 2017 01:27:35 GMT</pubDate>
    <dc:creator>pardeep_kumar</dc:creator>
    <dc:date>2017-06-28T01:27:35Z</dc:date>
    <item>
      <title>Disadvantages of replication factor 1 on 200GB of data per day</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Disadvantages-of-replication-factor-1-on-200GB-of-data-per/m-p/226813#M63747</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have data coming in about 200 GB per day from Cassandra database into hdfs.... what are the disadvantages especially when the replication factor is 1 other than losing the data when the datanode fails....&lt;/P&gt;&lt;P&gt;I believe there will be lot of pressure on that node where the data exists ? I am trying to understand what happens during querying large chunks of data from these data nodes with rep factor set to 1.&lt;/P&gt;&lt;P&gt;Thanks. &lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 11:50:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Disadvantages-of-replication-factor-1-on-200GB-of-data-per/m-p/226813#M63747</guid>
      <dc:creator>pmj</dc:creator>
      <dc:date>2022-09-16T11:50:32Z</dc:date>
    </item>
    <item>
      <title>Re: Disadvantages of replication factor 1 on 200GB of data per day</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Disadvantages-of-replication-factor-1-on-200GB-of-data-per/m-p/226814#M63748</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/14451/pjalleda.html"&gt;PJ&lt;/A&gt; Even after setting replication factor as 1 the data would be split into blocks and would be distributed across different datanodes. So, incase of a datanode failure you will only be able to partially retrieve data. Other advantage of setting replication factor &amp;gt; 1 is parallel processing, i.e. you have multiple copies of data at multiple places and all the machines can simultaneously process data.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Jun 2017 21:27:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Disadvantages-of-replication-factor-1-on-200GB-of-data-per/m-p/226814#M63748</guid>
      <dc:creator>ibhatt</dc:creator>
      <dc:date>2017-06-27T21:27:04Z</dc:date>
    </item>
    <item>
      <title>Re: Disadvantages of replication factor 1 on 200GB of data per day</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Disadvantages-of-replication-factor-1-on-200GB-of-data-per/m-p/226815#M63749</link>
      <description>&lt;P&gt;Well there are many disadvantages of using replication factor 1 and we strongly do not recommend it for below reasons:&lt;/P&gt;&lt;P&gt;1. Data loss --&amp;gt; One or more datanode or disk failure will result in data loss.&lt;/P&gt;&lt;P&gt;2. Performance issues --&amp;gt; Having replication factor of more than 1 results in more parallelization. &lt;/P&gt;&lt;P&gt;3. Handling Failure --&amp;gt; With replication factor &amp;gt; 1, one or more Datanode doesn't result in job failure.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jun 2017 01:27:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Disadvantages-of-replication-factor-1-on-200GB-of-data-per/m-p/226815#M63749</guid>
      <dc:creator>pardeep_kumar</dc:creator>
      <dc:date>2017-06-28T01:27:35Z</dc:date>
    </item>
  </channel>
</rss>

