Created on 11-14-2017 09:11 AM - edited 09-16-2022 05:31 AM
Hi,
How is data replicated in Kudu? My understanding is that kudu has one replica of all data and 2 replicas with operational logs. From the apache docs i get this "Kudu does not replicate the on-disk storage of a tablet,
but rather just its operation log. The physical storage of each replica of a tablet is fully decoupled."
In case of a disk failure, if the disk contains the actual data and not operation log, how is it recovered??
Created 11-15-2017 01:12 PM
Somewhat. Take a look here for more details about the relationship between tablets and tablets: https://kudu.apache.org/docs/schema_design.html
There's an important distinction to be made: a tablet is a logical concept (it's a chunk of a table); a replica is a copy of a single tablet. There may be many replicas of a single tablet, depending on the user-specified properties of the table.
E.g. say I have "Table 1" with replication factor 3. This means that every tablet belonging to "Table 1" will always try to maintain 3 replicas/copies. Say "Table 1" has two tablets, "A" and "B", each will have three replicas. A replica of "A" could fail due to a server failure or somesuch, in which case "A" will try to replicate back up to having 3 healthy replicas. This is completely orthogonal to "B".
So yes, a tablet maintains its operational log, but also all of the data associated with it, because it is just a chunk of a table.
Hope this helped!
Created 08-25-2021 01:07 AM
hi,adar:
if both the WAL segments and the CFiles are copied duing a tablet copy,then the follower tablet will alse flushing wal data to disk when growing up to 8M,in my opinion there has no difference between master tablet and follower tablet during the reading and writing,is that right?
Created 11-14-2017 10:48 AM
Hi,
You can have multiple replicas of data stored in Kudu tables -- Kudu allows you to configure per-table replication factor when creating a table. Replication factors of 3, 5, and 7 are available out of the box; for higher if you need to tweak the --max_num_replicas mater's flag.
Under the hood, every tablet (part of the table which corresponds to a partition) is a Raft cluster, where every transaction is considered committed only when it's replicated and acknowledged back to the leader replica by the majority of replicas in the tablet.
Replicas of one tablet are distributed among different tablet servers (it's not possible to run multiple replicas of one tablet at the same tablet server). Unless the replication factor is set to 1 (i.e. no replication at all) or all tablet servers are run on the same machine (which is a bad idea), then for every tablet there should be at least one replica having the copy of the data once a disk on one server fails.
You can get more details at https://kudu.apache.org/overview.html#distribution-and-fault-tolerance
and https://github.com/apache/kudu/blob/master/docs/design-docs/consensus.md
I hope this helps.
Created 11-15-2017 04:32 AM
Created 11-15-2017 09:17 AM
Created 11-15-2017 12:42 PM
Created 11-15-2017 01:12 PM
Somewhat. Take a look here for more details about the relationship between tablets and tablets: https://kudu.apache.org/docs/schema_design.html
There's an important distinction to be made: a tablet is a logical concept (it's a chunk of a table); a replica is a copy of a single tablet. There may be many replicas of a single tablet, depending on the user-specified properties of the table.
E.g. say I have "Table 1" with replication factor 3. This means that every tablet belonging to "Table 1" will always try to maintain 3 replicas/copies. Say "Table 1" has two tablets, "A" and "B", each will have three replicas. A replica of "A" could fail due to a server failure or somesuch, in which case "A" will try to replicate back up to having 3 healthy replicas. This is completely orthogonal to "B".
So yes, a tablet maintains its operational log, but also all of the data associated with it, because it is just a chunk of a table.
Hope this helped!
Created 11-15-2017 01:59 PM