<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Data replication in Kudu in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61823#M12726</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How is data replicated in Kudu? My understanding is that kudu has one replica of all data and 2 replicas with operational logs. From the apache docs i get this "Kudu does not replicate the on-disk storage of a tablet,&lt;BR /&gt;but rather just its operation log. The physical storage of each replica of a tablet is fully decoupled."&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In case of a disk failure, if the disk contains the actual data and not operation log, how is it recovered??&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 12:31:15 GMT</pubDate>
    <dc:creator>RajeshBodolla</dc:creator>
    <dc:date>2022-09-16T12:31:15Z</dc:date>
    <item>
      <title>Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61823#M12726</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How is data replicated in Kudu? My understanding is that kudu has one replica of all data and 2 replicas with operational logs. From the apache docs i get this "Kudu does not replicate the on-disk storage of a tablet,&lt;BR /&gt;but rather just its operation log. The physical storage of each replica of a tablet is fully decoupled."&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In case of a disk failure, if the disk contains the actual data and not operation log, how is it recovered??&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:31:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61823#M12726</guid>
      <dc:creator>RajeshBodolla</dc:creator>
      <dc:date>2022-09-16T12:31:15Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61824#M12727</link>
      <description>&lt;P&gt;Hi Rajesh,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Right, Kudu replicates data logically to multiple tservers based on each table's replication factor (typically 3), and in doing so, writes are only considered successful once durably written to a majority's write-ahead logs. From then on, each&amp;nbsp;tserver can maintain&amp;nbsp;the data via flushing and compactions, "decoupled" from the&amp;nbsp;writes to the log.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Currently, in cases of disk failures,&amp;nbsp;the single failed node will crash. Because the data is written to at least a majority,&amp;nbsp;Kudu "re-replicates" back up to full replication, i.e. all of the tablets that lost a replica because of the crash will notice that one of the servers is down and make a new copy on another healthy&amp;nbsp;server.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Nov 2017 18:32:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61824#M12727</guid>
      <dc:creator>awong</dc:creator>
      <dc:date>2017-11-14T18:32:27Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61825#M12728</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can have&amp;nbsp;multiple replicas of data stored in Kudu tables -- Kudu allows you to configure per-table replication factor when creating a table. &amp;nbsp;Replication factors of 3, 5, and 7 are available out of the box; for higher if you need to&amp;nbsp;tweak the --max_num_replicas mater's flag.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Under the hood, every tablet (part of the table which corresponds to a partition) is a&amp;nbsp;Raft cluster, where every transaction is considered committed only when it's replicated and acknowledged back to the leader replica&amp;nbsp;by the majority of replicas in the tablet.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Replicas of one tablet are distributed among different tablet servers (it's not possible to run multiple replicas of one&amp;nbsp;tablet at the same tablet server).&amp;nbsp; Unless the replication factor is set to 1 (i.e. no replication at all) or all tablet servers are run on the same machine (which is a bad&amp;nbsp;idea), then for every tablet there should be at least one replica having the copy of the data once a disk on one server fails.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can get more details at&amp;nbsp;&lt;A href="https://kudu.apache.org/overview.html#distribution-and-fault-tolerance" target="_blank"&gt;https://kudu.apache.org/overview.html#distribution-and-fault-tolerance&lt;/A&gt;&lt;/P&gt;&lt;P&gt;and&amp;nbsp;&lt;A href="https://github.com/apache/kudu/blob/master/docs/design-docs/consensus.md" target="_blank"&gt;https://github.com/apache/kudu/blob/master/docs/design-docs/consensus.md&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I hope this helps.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Nov 2017 18:48:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61825#M12728</guid>
      <dc:creator>Alexey1c</dc:creator>
      <dc:date>2017-11-14T18:48:44Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61862#M12729</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/21236"&gt;@Alexey1c&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/20912"&gt;@awong&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for the information but sorry i couldn't understand. Do you mean we have different replication policies for tables and tablets??&lt;/P&gt;</description>
      <pubDate>Wed, 15 Nov 2017 12:32:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61862#M12729</guid>
      <dc:creator>RajeshBodolla</dc:creator>
      <dc:date>2017-11-15T12:32:23Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61877#M12730</link>
      <description>Right, the replication factor is specified per table, so you could have&lt;BR /&gt;different tables with different replication factors. Every tablet in a&lt;BR /&gt;table will honor its table's replication factor.&lt;BR /&gt;</description>
      <pubDate>Wed, 15 Nov 2017 17:17:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61877#M12730</guid>
      <dc:creator>awong</dc:creator>
      <dc:date>2017-11-15T17:17:35Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61884#M12731</link>
      <description>Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/20912"&gt;@awong&lt;/a&gt;&lt;BR /&gt;&lt;BR /&gt;Thanks for the details. So below is the conclusion.&lt;BR /&gt;&lt;BR /&gt;Table: has replication specified at creation and all are complete replicas.&lt;BR /&gt;&lt;BR /&gt;Tablet: has only operational log.</description>
      <pubDate>Wed, 15 Nov 2017 20:42:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61884#M12731</guid>
      <dc:creator>RajeshBodolla</dc:creator>
      <dc:date>2017-11-15T20:42:41Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61887#M12732</link>
      <description>&lt;P&gt;Somewhat. Take a look here for more details about the relationship between tablets and tablets:&amp;nbsp;&lt;A href="https://kudu.apache.org/docs/schema_design.html" target="_blank"&gt;https://kudu.apache.org/docs/schema_design.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There's an important distinction to be made: a tablet is a logical concept (it's a chunk of a table); a replica is a copy of a single tablet. There may be many replicas of a single tablet, depending on the user-specified properties of the table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;E.g. say I have "Table 1" with replication factor 3. This means that every tablet belonging to "Table 1" will&amp;nbsp;always try to maintain 3 replicas/copies. Say "Table 1" has two tablets, "A" and "B", each&amp;nbsp;will have three replicas. A replica of "A" could fail due to a&amp;nbsp;server failure or somesuch, in which case&amp;nbsp;"A" will try to replicate back up to having 3 healthy replicas. This is completely&amp;nbsp;orthogonal to "B".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So yes, a tablet maintains its operational log, but also all of the data associated with it, because it is just a chunk of a table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helped!&lt;/P&gt;</description>
      <pubDate>Wed, 15 Nov 2017 21:12:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61887#M12732</guid>
      <dc:creator>awong</dc:creator>
      <dc:date>2017-11-15T21:12:43Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61888#M12733</link>
      <description>&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/20912"&gt;@awong&lt;/a&gt;&lt;BR /&gt;&lt;BR /&gt;That was great. Thanks for the details.</description>
      <pubDate>Wed, 15 Nov 2017 21:59:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/61888#M12733</guid>
      <dc:creator>RajeshBodolla</dc:creator>
      <dc:date>2017-11-15T21:59:07Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63223#M12734</link>
      <description>&lt;P&gt;Hi Awong,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;Right, Kudu replicates data logically to multiple tservers based on each table's replication factor (typically 3), and in doing so, writes are only considered successful once durably written to a majority's write-ahead logs. From then on, each tserver can maintain the data via flushing and compactions, "decoupled" from the writes to the log.&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;After the flushing and compaction of tserver, each tablet will have 2 physical replications. And the subsequent CLOSEST_REPLICA scan don't have to compact the wal, is this right?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best regards,&lt;/P&gt;&lt;P&gt;Tony&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jan 2018 07:20:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63223#M12734</guid>
      <dc:creator>tony12</dc:creator>
      <dc:date>2018-01-02T07:20:12Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63246#M12735</link>
      <description>&lt;P&gt;I'm not Andrew but I don't understand your question. Tablet replicas are flushed and/or compacted independently of one another, which means the physical layout of each one may be different; thus it would be incorrect to consider them "physically replicated".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Furthermore, all scans (whether CLOSEST_REPLICA or otherwise) are read-only operations and thus don't trigger WAL garbage collection or any other kind of read-write server-side action.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jan 2018 21:56:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63246#M12735</guid>
      <dc:creator>adar</dc:creator>
      <dc:date>2018-01-02T21:56:11Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63252#M12736</link>
      <description>&lt;P&gt;Hi adar,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think I already get the answer from your post. Maybe the following &lt;SPAN&gt;explanation&lt;/SPAN&gt; can&amp;nbsp;clarify my original question.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I insert data into kudu,&amp;nbsp; only&amp;nbsp;&lt;SPAN&gt;write to a majority's write-ahead logs. The internal flushing and/or compacting for each tablet will generate a set of CFiles as replicas.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;And all scan only need to scan the &lt;STRONG&gt;replica&amp;nbsp;&lt;/STRONG&gt;(a set of CFiles which contain the base data and the delta data)&amp;nbsp; and the &lt;STRONG&gt;MemRowset &lt;/STRONG&gt;to return the query result. Is this right? &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But for the tablet coping, it will only copy the wal or both the wal and the replica will be copied?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Best regards,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Tony&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jan 2018 01:09:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63252#M12736</guid>
      <dc:creator>tony12</dc:creator>
      <dc:date>2018-01-03T01:09:56Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63280#M12737</link>
      <description>&lt;P&gt;Quick note: Kudu calls the "&lt;SPAN&gt;set of CFiles which contain the base data and the delta data" a DiskRowSet.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But your understanding is correct: during a scan, the contents of the MemRowSet and some DiskRowSets are scanned for data. During a tablet copy, both the WAL segments and the CFiles are copied.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jan 2018 21:07:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63280#M12737</guid>
      <dc:creator>adar</dc:creator>
      <dc:date>2018-01-03T21:07:27Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63286#M12738</link>
      <description>Hi adar,&lt;BR /&gt;&lt;BR /&gt;Thanks for your quick reply.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Tony&lt;BR /&gt;</description>
      <pubDate>Thu, 04 Jan 2018 00:57:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/63286#M12738</guid>
      <dc:creator>tony12</dc:creator>
      <dc:date>2018-01-04T00:57:59Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/82817#M12739</link>
      <description>what is CFile and wal file ?</description>
      <pubDate>Sun, 25 Nov 2018 16:30:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/82817#M12739</guid>
      <dc:creator>Ym255</dc:creator>
      <dc:date>2018-11-25T16:30:22Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/82908#M12740</link>
      <description>&lt;P&gt;&lt;EM&gt;CFile&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;is an on-disk columnar storage format which holds data and associated B-Tree indexes.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;A href="https://github.com/cloudera/kudu/blob/master/docs/design-docs/cfile.md" target="_blank"&gt;https://github.com/cloudera/kudu/blob/master/docs/design-docs/cfile.md&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Nov 2018 10:54:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/82908#M12740</guid>
      <dc:creator>RajeshBodolla</dc:creator>
      <dc:date>2018-11-27T10:54:29Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/82932#M12741</link>
      <description>&lt;P&gt;A WAL file is a Kudu tablet write-ahead log file. You can read an overview of how the Kudu write path works here (it's a fairly techincal blog post):&amp;nbsp;&lt;A href="https://blog.cloudera.com/blog/2017/04/apache-kudu-read-write-paths/" target="_blank"&gt;https://blog.cloudera.com/blog/2017/04/apache-kudu-read-write-paths/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The WAL file location is controlled by the configuration parameter --fs_wal_dir which you can read about at &lt;A href="https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_fs_wal_dir" target="_blank"&gt;https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_fs_wal_dir&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Nov 2018 21:03:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/82932#M12741</guid>
      <dc:creator>mpercy</dc:creator>
      <dc:date>2018-11-27T21:03:18Z</dc:date>
    </item>
    <item>
      <title>Re: Data replication in Kudu</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/323126#M229030</link>
      <description>&lt;P&gt;hi,adar:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;SPAN&gt;if both the WAL segments and the CFiles are copied duing a tablet copy,then the follower tablet will&amp;nbsp; alse flushing wal data to disk when growing up to 8M,in my opinion&amp;nbsp; there has no difference between master tablet and follower tablet during the reading and writing,is that right?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Aug 2021 08:07:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-replication-in-Kudu/m-p/323126#M229030</guid>
      <dc:creator>chn</dc:creator>
      <dc:date>2021-08-25T08:07:22Z</dc:date>
    </item>
  </channel>
</rss>

