Created on 01-17-2022 02:27 AM - edited 09-16-2022 07:44 AM
Hi,
I am getting error when I run kudu ksck command
Errors:
==================
Corruption: master consensus error: there are master consensus conflicts
FAILED
Runtime error: ksck discovered errors
All 3 Masters & Tablet server are reported as healthy, Any assistance / pointer in fixing this error.
Thanks
Created 01-18-2022 04:15 PM
apparently, you have mismatch UUIDs for the 'master3.com'
peers { permanent_uuid: "8be74245ecaf4b5baf18b24dbc730922" member_type: VOTER last_known_addr { host: "master3.com" port: 7051 } }
as I can not see this UUID in consensus matrix
A = 865af50ae13e4cfagh5719b865d6716a
B = c8d59ba15dbb4578900f597bb48bd9e0
C = 8be74245ecaf4b5baf18b24dbc7318ea
so what you can do now is to manually modify the consensus metadata on this 'B' host, so you need to stop this Kudu master role on 'B' host, then locate the directory - 'consensus-meta' on this host (this folder could be either under the directory of 'fs_wal_dir' , or the first disk of 'fs_data_dirs'), then you will use "kudu pbc dump 00000000000000000000000000000000" to check the all peers UUID, and I believe you will see the same wrong UUID - '8be74245ecaf4b5baf18b24dbc730922' in the output.
The actions you can take is to backup this 00000000000000000000000000000000 file, and use 'kudu pbc edit 00000000000000000000000000000000' to change the wrong UUID to '8be74245ecaf4b5baf18b24dbc7318ea' (the data displayed in "kudu pbc edit' is encoded UUID string, so you may want to check other 2 Master hosts to get the encoded UUID string.
after above, then you restart this master to see if this issue can be fixed.
Created 01-17-2022 09:59 PM
Can you attach the full ksck report for review?
is there any consensus matrix shown in ksck report?
Created 01-17-2022 10:43 PM
Cannot attach ksck report due to company infosec policy.
Regarding consensus matrix is the below you are looking for ??
All reported replicas are:
A = 865af50ae13e4cfagh5719b865d6716a
B = c8d59ba15dbb4578900f597bb48bd9e0
C = 8be74245ecaf4b5baf18b24dbc7318ea
The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+--------------+--------------+--------------+------------
A | A* B C | 68 | -1 | Yes
B | A B C | 68 | -1 | Yes
C | A* B C | 68 | -1 | Yes
Created 01-17-2022 10:57 PM
Thanks, so the matrix shows that both 'A' and 'C' think the leader master is 'A', But 'B' thinks no leader master
So you may want to check the master log on 'B' host to figure out why it failed to vote a leader in election cycle.
Created 01-18-2022 01:16 AM
@ChrisGe Appericate your assistacne, I see the below in the logs,
10:04:02.026 PM INFO cc:494 T 00000000000000000000000000000000 P c8d59ba15dbb4578900f597bb48bd9e0 [term 68 FOLLOWER]: Starting pre-election with config: opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: "865af50ae13e4cfabe5719b865d6716a" member_type: VOTER last_known_addr { host: "master1.com" port: 7051 } } peers { permanent_uuid: "c8d59ba15dbb4578900f597bb48ce9e0" member_type: VOTER last_known_addr { host: "master2.com" port: 7051 } } peers { permanent_uuid: "8be74245ecaf4b5baf18b24dbc730922" member_type: VOTER last_known_addr { host: "master3.com" port: 7051 } }
10:04:02.026 PM INFO cc:296
T 00000000000000000000000000000000 P c8d59ba15dbb4578900f597bb48bd9e0 [CANDIDATE]: Term 69 pre-election: Requested pre-vote from peers 865af50ae13e4cfabe5719b865d6716a (master1.com:7051), 8be74245ecaf4b5baf18b24dbc730922 (master3.com:7051)
10:04:02.028 PM INFO cc:310
T 00000000000000000000000000000000 P c8d59ba15dbb4578900f597bb48bd9e0 [CANDIDATE]: Term 69 pre-election: Election decided. Result: candidate lost. Election summary: received 3 responses out of 3 voters: 1 yes votes; 2 no votes. yes voters: c8d59ba15dbb4578900f597bb48bd9e0; no voters: 865af50ae13e4cfagh5719b865d6716a, 8be74245ecaf4b5baf18b24dbc730922
10:04:02.028 PM INFO cc:2592
T 00000000000000000000000000000000 P c8d59ba15dbb4578900f597bb48bd9e0 [term 68 FOLLOWER]: Leader pre-election lost for term 69. Reason: could not achieve majority
Thanks
Created 01-18-2022 04:15 PM
apparently, you have mismatch UUIDs for the 'master3.com'
peers { permanent_uuid: "8be74245ecaf4b5baf18b24dbc730922" member_type: VOTER last_known_addr { host: "master3.com" port: 7051 } }
as I can not see this UUID in consensus matrix
A = 865af50ae13e4cfagh5719b865d6716a
B = c8d59ba15dbb4578900f597bb48bd9e0
C = 8be74245ecaf4b5baf18b24dbc7318ea
so what you can do now is to manually modify the consensus metadata on this 'B' host, so you need to stop this Kudu master role on 'B' host, then locate the directory - 'consensus-meta' on this host (this folder could be either under the directory of 'fs_wal_dir' , or the first disk of 'fs_data_dirs'), then you will use "kudu pbc dump 00000000000000000000000000000000" to check the all peers UUID, and I believe you will see the same wrong UUID - '8be74245ecaf4b5baf18b24dbc730922' in the output.
The actions you can take is to backup this 00000000000000000000000000000000 file, and use 'kudu pbc edit 00000000000000000000000000000000' to change the wrong UUID to '8be74245ecaf4b5baf18b24dbc7318ea' (the data displayed in "kudu pbc edit' is encoded UUID string, so you may want to check other 2 Master hosts to get the encoded UUID string.
after above, then you restart this master to see if this issue can be fixed.