About awong

awong · ‎11-28-2019

Kudu's HMS integration is a CDH 6.3 feature. If you would like to use it end-to-end, you'll have to use it with CDH 6.3. That said, even without the Kudu's native HMS integration, in all versions of CDH, when using Impala "internal" or "managed" Kudu tables (see here for more details), Impala will create HMS metadata on behalf of Kudu.

awong · ‎11-27-2019

Thanks for the question! Can you confirm that your Impala version is also CDH 6.3?

awong · ‎11-17-2019

Hm, that's pretty odd. And the messages are still coming in? These aren't old messages? If you run `kudu cluster ksck` on your cluster, what does it say about the health of that tablet?

awong · ‎11-17-2019

You should check your other tablet servers. Those logs may be indicating that some of the tablet replicas are trying to communicate with other replicas on other servers, but the replicas on other servers are still bootstrapping. Or are all of your tablet servers done bootstrapping?

awong · ‎11-17-2019

That message indicates that the Kudu tserver is in the process of bootstrapping all of its tablet replicas. It hasn't gotten to tablet 3292e490cf4843d994a45f9a4c7782c0 yet though, but it should soon. If you look at MYSERVER.com:8050/tablets, you should be able to see the current state of the tablet replicas on that tablet server (INITIALIZED, BOOTSTRAPPING, RUNNING, etc.). The bootstrapping process being slow can indicate a number of things, like there being a large number of tablet replicas on that particular tablet server (in which case you might want to rebalance the cluster using the rebalancer tool), or that the WAL disk is slow (in which case you might want to use a faster disk for the -fs_wal_dir, since the disk is shared among all tablet replicas). Hope this helped!

awong · ‎02-28-2019

The flag is on both the tservers and the master because both of them need to maintain consensus, and thus, track replica lag -- the masters maintain a consistent, replicated catalog in HA deployments; the tservers maintain consistent, replicated table partitions. In this case, probably only the flag on the tserver is important. Ah, that's unfortunate, so probably raising that configuration is worth a shot, though note that it's an experimental flag and is thus not well-tested. Also note that you'll also need `--unlock_experimental_flags` to change that flag. I think it's probably fine to have different values for different tservers, but again, it's experimental, meaning not-well-tested, and it might lead to odd behavior (experimenting is certainly not discouraged though if you can accept that!). I believe RYW is already guaranteed by Impala based on what it's doing (described this above, and also see IMPALA-3788 for the details). Kudu's RYW scan mode works similarly by internally passing around written timestamps and sending those timestamps with scan requests. Kudu can make the optimization, though, that if Kudu knows that there exists a timestamp T2 > T1 that can be scanned without waiting (it internally tracks what operations have been applied and are visible locally), it will scan at T2 instead of T1, which would show a more up-to-date picture to scanners. There's an open ticket to support this mode in Impala (see IMPALA-7184), but I don't think anyone is working on it. I also don't think this mode would help in this case.

awong · ‎02-28-2019

Ah that's interesting. It looks like that tablet was in the process of removing a _different_ replica from its Raft config, left with one good replica (fe5c3538509d47d186f5784ea586f260) and the one bad one (13950e716d21404188b733521dbf4d97). From the looks of it, it probably can't make any progress on that because the one bad one is probably responding with the WRONG_SERVER_UUID response. In this case, I think the right tool would be unsafe_change_config to force the Raft config to only contain the one good replica, after which Raft should take the wheel and replicate back up to full replication, e.g. kudu remote_replica unsafe_change_config <tserver address of fe5c3538509d47d186f5784ea586f260> acc6adcd6b554b5cabfed268184591a9 fe5c3538509d47d186f5784ea586f260

awong · ‎02-28-2019

You need to remove the old UUID from any existing Raft configurations. I haven't tested this myself, but per the documentation here, you should be able to run: kudu tablet change_config remove_replica <master_addresses> <tablet_id> <old_tserver_uuid> in this case I think it'd be: kudu tablet change_config remove_replica <master_addresses> acc6adcd6b554b5cabfed268184591a9 13950e716d21404188b733521dbf4d97

awong · ‎02-27-2019

The error message you posted is indicative that there is some lag in replicating operations between tablet replicas (nothing NTP-related here). Here is a post that goes into a bit of detail about what these options are doing. Behind the scenes, you can imagine roughly the following sequence of events: 1. We have a tablet with a healthy quorum and the leader gets a write request. 2. The leader will assign the write a timestamp T1, and once this timestamp has been persisted to a majority of replicas, the write is accepted and the timestamp T1 gets sent back to clients. 3a. For the replicas that successfully replicate T1, the scannable range for READ_AT_SNAPSHOT scans will be bumped to include T1. 3b. Note that it's possible that at this point there is a replica that has not replicated/seen the write at T1; the only guarantee at this point is that a majority of replicas have seen it. This replica is "lagging". 4. A READ_AT_SNAPSHOT scan will try to enforce that it only sees rows written past T1. If this scan goes to a lagging replica, it may wait up to `--safe_time_max_lag_ms` for that replica to receive the write for T1, which should eventually happen based on Kudu's usage of Raft replication, but it seems in this case it was too slow. Why was the replica lagging? One possibility is that there happened to be a lot of consensus traffic caused by a slow network and that was led to slower replication. It's also possible there was a network partition to result in that too. Without knowing more about the cluster and the load on it, it's hard to say. There might be information in the tablet server logs, if you search around for one of the tablet IDs that you noticed being slow. Also worth running `ksck` to see if there is anything wrong with the state of the tablets. Beyond addressing the underlying lag, if you are choosing READ_AT_SNAPSHOT for its read-your-writes properties, I don't think there is an easy path forward. If you don't need these guarantees, using the READ_LATEST read mode should get you around this. It's also worth noting that Kudu exposes an additional READ_YOUR_WRITES mode, but I don't think it's integrated with Impala at the moment.

awong · ‎01-09-2019

In terms of resources, five masters wouldn't strain the cluster much. The big change is that the state that lives on the master (e.g. catalog metadata, etc.) would need to be replicated with a replication factor of 5 in mind (i.e. at least 3 copies to be considered "written"). While this is possible, the recommended configuration is 3. It is the most well-tested and commonly-used.

Online	Offline
Last Visited	‎08-30-2021 03:24 PM

Member Since	‎02-23-2017 10:17 AM
Last Visited	‎08-30-2021 03:24 PM
Posts	34
Kudos received	15

Cloudera Community

Re: Kudu Error code: WRONG_SERVER_UUID

Re: Kudu read fails: Tablet is lagging too much to...

Re: Small Kudu Cluster

Re: Small Kudu Cluster

Re: Kudu Master Directories

Re: Problem with Kudu table name

Re: Problem with Kudu table name

Re: Kudu T-Server Error

Re: Kudu T-Server Error

Re: Kudu T-Server Error

Re: Kudu read fails: Tablet is lagging too much to...

Re: Kudu Error code: WRONG_SERVER_UUID

Re: Kudu Error code: WRONG_SERVER_UUID

Re: Kudu read fails: Tablet is lagging too much to...

Re: Small Kudu Cluster