Support Questions

Find answers, ask questions, and share your expertise

Kudu Error code: WRONG_SERVER_UUID

avatar
Explorer

 

Hello all, I seem to have run into a Kudu 1.4 bug, and I am wondering if there is any work around other than upgrading. 

 

I lost a tablet in my dev environment and wasn't aware of it for a while (24hours or so) which I just assumed I could just wipe the directories and rebuild it thinking all the replicas had already been replicated to other tablets well that wasn't the case. After getting the tablet backup I ran a ksck and noted I have two tables that are under-replicated, and I am seeing a lot of alerts like the one below.  Now I did see this https://issues.apache.org/jira/browse/KUDU-1613 and it says it's fixed in 1.7 but upgrading isn't an option at this time.

 

T acc6adcd6b554b5cabfed268184591a9 P fa6625fcbb204fba86ac6d4007c3f695 -> Peer 13950e716d21404188b733521dbf4d97 (1.2.3.4.domain.com:7050): Couldn't send request to peer 13950e716d21404188b733521dbf4d97 for tablet acc6adcd6b554b5cabfed268184591a9. Error code: WRONG_SERVER_UUID (16). Status: Invalid argument: UpdateConsensus: Wrong destination UUID requested. Local UUID: 71fc2ef6adb54e59b27de1ca1c8cbbf9. Requested UUID: 13950e716d21404188b733521dbf4d97. Retrying in the next heartbeat period. Already tried 354599 times.

 

1 ACCEPTED SOLUTION

avatar
Rising Star

You need to remove the old UUID from any existing Raft configurations. I haven't tested this myself, but per the documentation here, you should be able to run:

 

kudu tablet change_config remove_replica <master_addresses> <tablet_id> <old_tserver_uuid>

in this case I think it'd be:

 

kudu tablet change_config remove_replica <master_addresses> acc6adcd6b554b5cabfed268184591a9 13950e716d21404188b733521dbf4d97

View solution in original post

4 REPLIES 4

avatar
Rising Star

You need to remove the old UUID from any existing Raft configurations. I haven't tested this myself, but per the documentation here, you should be able to run:

 

kudu tablet change_config remove_replica <master_addresses> <tablet_id> <old_tserver_uuid>

in this case I think it'd be:

 

kudu tablet change_config remove_replica <master_addresses> acc6adcd6b554b5cabfed268184591a9 13950e716d21404188b733521dbf4d97

avatar
Explorer

 

Thanks! This seemed to work for all but one tablet. When I run the command, I get the following message.

 

Illegal state: RaftConfig change currently pending. Only one is allowed at a time.
  Committed config: opid_index: 920502 OBSOLETE_local: false peers { permanent_uuid: "13950e716d21404188b733521dbf4d97" member_type: VOTER last_known_addr { host: "node02.domain.com" port: 7050 } } peers { permanent_uuid: "fa6625fcbb204fba86ac6d4007c3f695" member_type: VOTER last_known_addr { host: "node10.domain.com" port: 7050 } } peers { permanent_uuid: "fe5c3538509d47d186f5784ea586f260" member_type: VOTER last_known_addr { host: "node12.researchnow.com" port: 7050 } }.
  Pending config: opid_index: 1318207 OBSOLETE_local: false peers { permanent_uuid: "13950e716d21404188b733521dbf4d97" member_type: VOTER last_known_addr { host: "node02.domain.com" port: 7050 } } peers { permanent_uuid: "fe5c3538509d47d186f5784ea586f260" member_type: VOTER last_known_addr { host: "node12.domain.com" port: 7050 } }

 

avatar
Rising Star

Ah that's interesting. It looks like that tablet was in the process of removing a _different_ replica from its Raft config, left with one good replica (fe5c3538509d47d186f5784ea586f260) and the one bad one (13950e716d21404188b733521dbf4d97). From the looks of it, it probably can't make any progress on that because the one bad one is probably responding with the WRONG_SERVER_UUID response.

 

In this case, I think the right tool would be unsafe_change_config to force the Raft config to only contain the one good replica, after which Raft should take the wheel and replicate back up to full replication, e.g.

 

kudu remote_replica unsafe_change_config <tserver address of fe5c3538509d47d186f5784ea586f260> acc6adcd6b554b5cabfed268184591a9 fe5c3538509d47d186f5784ea586f260

avatar
Explorer

Thanks again this seemed to have fixed the issue I am going through most of the tables as it looks like a few of them are still trying to talk to the old UUID.