Member since
02-23-2017
34
Posts
15
Kudos Received
9
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 6913 | 02-28-2019 10:37 AM | |
| 10379 | 02-27-2019 03:34 PM | |
| 6123 | 01-09-2019 07:39 PM | |
| 6210 | 01-08-2019 10:46 AM |
02-28-2019
03:34 PM
1 Kudo
The flag is on both the tservers and the master because both of them need to maintain consensus, and thus, track replica lag -- the masters maintain a consistent, replicated catalog in HA deployments; the tservers maintain consistent, replicated table partitions. In this case, probably only the flag on the tserver is important. Ah, that's unfortunate, so probably raising that configuration is worth a shot, though note that it's an experimental flag and is thus not well-tested. Also note that you'll also need `--unlock_experimental_flags` to change that flag. I think it's probably fine to have different values for different tservers, but again, it's experimental, meaning not-well-tested, and it might lead to odd behavior (experimenting is certainly not discouraged though if you can accept that!). I believe RYW is already guaranteed by Impala based on what it's doing (described this above, and also see IMPALA-3788 for the details). Kudu's RYW scan mode works similarly by internally passing around written timestamps and sending those timestamps with scan requests. Kudu can make the optimization, though, that if Kudu knows that there exists a timestamp T2 > T1 that can be scanned without waiting (it internally tracks what operations have been applied and are visible locally), it will scan at T2 instead of T1, which would show a more up-to-date picture to scanners. There's an open ticket to support this mode in Impala (see IMPALA-7184), but I don't think anyone is working on it. I also don't think this mode would help in this case.
... View more
02-28-2019
01:34 PM
1 Kudo
Ah that's interesting. It looks like that tablet was in the process of removing a _different_ replica from its Raft config, left with one good replica (fe5c3538509d47d186f5784ea586f260) and the one bad one (13950e716d21404188b733521dbf4d97). From the looks of it, it probably can't make any progress on that because the one bad one is probably responding with the WRONG_SERVER_UUID response. In this case, I think the right tool would be unsafe_change_config to force the Raft config to only contain the one good replica, after which Raft should take the wheel and replicate back up to full replication, e.g. kudu remote_replica unsafe_change_config <tserver address of fe5c3538509d47d186f5784ea586f260> acc6adcd6b554b5cabfed268184591a9 fe5c3538509d47d186f5784ea586f260
... View more
02-28-2019
10:37 AM
1 Kudo
You need to remove the old UUID from any existing Raft configurations. I haven't tested this myself, but per the documentation here, you should be able to run: kudu tablet change_config remove_replica <master_addresses> <tablet_id> <old_tserver_uuid> in this case I think it'd be: kudu tablet change_config remove_replica <master_addresses> acc6adcd6b554b5cabfed268184591a9 13950e716d21404188b733521dbf4d97
... View more
02-27-2019
03:34 PM
1 Kudo
The error message you posted is indicative that there is some lag in replicating operations between tablet replicas (nothing NTP-related here). Here is a post that goes into a bit of detail about what these options are doing. Behind the scenes, you can imagine roughly the following sequence of events: 1. We have a tablet with a healthy quorum and the leader gets a write request. 2. The leader will assign the write a timestamp T1, and once this timestamp has been persisted to a majority of replicas, the write is accepted and the timestamp T1 gets sent back to clients. 3a. For the replicas that successfully replicate T1, the scannable range for READ_AT_SNAPSHOT scans will be bumped to include T1. 3b. Note that it's possible that at this point there is a replica that has not replicated/seen the write at T1; the only guarantee at this point is that a majority of replicas have seen it. This replica is "lagging". 4. A READ_AT_SNAPSHOT scan will try to enforce that it only sees rows written past T1. If this scan goes to a lagging replica, it may wait up to `--safe_time_max_lag_ms` for that replica to receive the write for T1, which should eventually happen based on Kudu's usage of Raft replication, but it seems in this case it was too slow. Why was the replica lagging? One possibility is that there happened to be a lot of consensus traffic caused by a slow network and that was led to slower replication. It's also possible there was a network partition to result in that too. Without knowing more about the cluster and the load on it, it's hard to say. There might be information in the tablet server logs, if you search around for one of the tablet IDs that you noticed being slow. Also worth running `ksck` to see if there is anything wrong with the state of the tablets. Beyond addressing the underlying lag, if you are choosing READ_AT_SNAPSHOT for its read-your-writes properties, I don't think there is an easy path forward. If you don't need these guarantees, using the READ_LATEST read mode should get you around this. It's also worth noting that Kudu exposes an additional READ_YOUR_WRITES mode, but I don't think it's integrated with Impala at the moment.
... View more
01-09-2019
07:39 PM
1 Kudo
In terms of resources, five masters wouldn't strain the cluster much. The big change is that the state that lives on the master (e.g. catalog metadata, etc.) would need to be replicated with a replication factor of 5 in mind (i.e. at least 3 copies to be considered "written"). While this is possible, the recommended configuration is 3. It is the most well-tested and commonly-used.
... View more
01-08-2019
04:48 PM
1 Kudo
That's correct, it is a manual process to get back up to three Masters, though the remaining two-Master deployment will still be useable until then. KUDU-2181 would improve this, but no one is working on it right now, AFAIK.
... View more
01-08-2019
10:46 AM
1 Kudo
The master role generally takes few resources, and it isn't uncommon to colocate a Tablet Server and Master. Additionally, it's is strongly encouraged to have at least four Tablet Servers for a minimal deployment for higher availability in the case of failures (see the note about KUDU-1097 here). That said, with five nodes, you could have three Masters and five Tablet Servers. Even still, five Tablet Servers isn't huge, so please take a look at the Scaling Limits and Scaling Guide to ensure you sufficiently provision your cluster. If a node with both roles fails, the following will happen: On the Master side, if the failed Master was a leader Master, a new leader will be elected between the remaining two Masters and business will continue as usual. No automatic recovery will happen to bring up a new Master, and these steps should be followed to recover the failed Master when convenient. On the Tablet Server side, the tablet replicas that existed on the Tablet Server role will automatically be re-replicated to the remaining four Tablet Servers. If you went with three Tablet Servers, this would not happen, since the remaining two Tablet Servers would already have replicas of every tablet, and the cluster would be stuck with every tablet having two copies instead of three. The service would still function, but another failure would render every tablet unavailable.
... View more