Support Questions
Find answers, ask questions, and share your expertise

Kudu master consensus error

Explorer

Hi,

 

I am getting error when I run kudu ksck command

 

Errors:
==================
Corruption: master consensus error: there are master consensus conflicts

FAILED
Runtime error: ksck discovered errors

 

All 3 Masters & Tablet server are reported as healthy, Any assistance / pointer in fixing this error.

 

Thanks 

 

1 ACCEPTED SOLUTION

Cloudera Employee

apparently,  you have mismatch UUIDs for the 'master3.com' 

peers { permanent_uuid: "8be74245ecaf4b5baf18b24dbc730922" member_type: VOTER last_known_addr { host: "master3.com" port: 7051 } }

 

as I can not see this UUID in consensus matrix 

  A = 865af50ae13e4cfagh5719b865d6716a
  B = c8d59ba15dbb4578900f597bb48bd9e0
  C = 8be74245ecaf4b5baf18b24dbc7318ea

 

so what you can do now is to manually modify the consensus metadata on this 'B' host, so you need to stop this Kudu master role on 'B' host, then locate the directory - 'consensus-meta' on this  host (this folder could be either under the directory of 'fs_wal_dir' , or the first disk of 'fs_data_dirs'),  then you will use "kudu pbc dump 00000000000000000000000000000000" to check the all peers UUID, and I believe you will see the same wrong UUID - '8be74245ecaf4b5baf18b24dbc730922' in the output.

 

The actions you can take is to backup this 00000000000000000000000000000000 file, and use 'kudu pbc edit 00000000000000000000000000000000' to change the wrong UUID to '8be74245ecaf4b5baf18b24dbc7318ea' (the data displayed in "kudu pbc edit' is encoded UUID string, so you may want to check other 2 Master hosts to get the encoded UUID string.

 

after above, then you restart this master to see if this issue can be fixed.

 

 

 

 

View solution in original post

5 REPLIES 5

Cloudera Employee

Can you attach the full ksck report for review?

 

is there any consensus matrix shown in ksck report? 

Explorer

@Tiger123 

 

Cannot attach ksck report due to company infosec policy.

Regarding consensus matrix is the below you are looking for ??

 

All reported replicas are:
  A = 865af50ae13e4cfagh5719b865d6716a
  B = c8d59ba15dbb4578900f597bb48bd9e0
  C = 8be74245ecaf4b5baf18b24dbc7318ea
The consensus matrix is:
 Config source |   Replicas   | Current term | Config index | Committed?
---------------+--------------+--------------+--------------+------------
 A             | A*  B   C    | 68           | -1           | Yes
 B             | A   B   C    | 68           | -1           | Yes
 C             | A*  B   C    | 68           | -1           | Yes

 

Cloudera Employee

Thanks, so the matrix shows that both 'A' and 'C' think the leader master is 'A',  But 'B' thinks no leader master 

 

So you may want to check the master log on 'B' host to figure out why it failed to vote a leader in election cycle. 

Explorer

@Tiger123  Appericate your assistacne, I see the below in the logs,

 

10:04:02.026 PM	INFO	cc:494	T 00000000000000000000000000000000 P c8d59ba15dbb4578900f597bb48bd9e0 [term 68 FOLLOWER]: Starting pre-election with config: opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: "865af50ae13e4cfabe5719b865d6716a" member_type: VOTER last_known_addr { host: "master1.com" port: 7051 } } peers { permanent_uuid: "c8d59ba15dbb4578900f597bb48ce9e0" member_type: VOTER last_known_addr { host: "master2.com" port: 7051 } } peers { permanent_uuid: "8be74245ecaf4b5baf18b24dbc730922" member_type: VOTER last_known_addr { host: "master3.com" port: 7051 } }
10:04:02.026 PM	INFO	cc:296	
T 00000000000000000000000000000000 P c8d59ba15dbb4578900f597bb48bd9e0 [CANDIDATE]: Term 69 pre-election: Requested pre-vote from peers 865af50ae13e4cfabe5719b865d6716a (master1.com:7051), 8be74245ecaf4b5baf18b24dbc730922 (master3.com:7051)
10:04:02.028 PM	INFO	cc:310	
T 00000000000000000000000000000000 P c8d59ba15dbb4578900f597bb48bd9e0 [CANDIDATE]: Term 69 pre-election: Election decided. Result: candidate lost. Election summary: received 3 responses out of 3 voters: 1 yes votes; 2 no votes. yes voters: c8d59ba15dbb4578900f597bb48bd9e0; no voters: 865af50ae13e4cfagh5719b865d6716a, 8be74245ecaf4b5baf18b24dbc730922
10:04:02.028 PM	INFO	cc:2592	
T 00000000000000000000000000000000 P c8d59ba15dbb4578900f597bb48bd9e0 [term 68 FOLLOWER]: Leader pre-election lost for term 69. Reason: could not achieve majority

 

Thanks 

Cloudera Employee

apparently,  you have mismatch UUIDs for the 'master3.com' 

peers { permanent_uuid: "8be74245ecaf4b5baf18b24dbc730922" member_type: VOTER last_known_addr { host: "master3.com" port: 7051 } }

 

as I can not see this UUID in consensus matrix 

  A = 865af50ae13e4cfagh5719b865d6716a
  B = c8d59ba15dbb4578900f597bb48bd9e0
  C = 8be74245ecaf4b5baf18b24dbc7318ea

 

so what you can do now is to manually modify the consensus metadata on this 'B' host, so you need to stop this Kudu master role on 'B' host, then locate the directory - 'consensus-meta' on this  host (this folder could be either under the directory of 'fs_wal_dir' , or the first disk of 'fs_data_dirs'),  then you will use "kudu pbc dump 00000000000000000000000000000000" to check the all peers UUID, and I believe you will see the same wrong UUID - '8be74245ecaf4b5baf18b24dbc730922' in the output.

 

The actions you can take is to backup this 00000000000000000000000000000000 file, and use 'kudu pbc edit 00000000000000000000000000000000' to change the wrong UUID to '8be74245ecaf4b5baf18b24dbc7318ea' (the data displayed in "kudu pbc edit' is encoded UUID string, so you may want to check other 2 Master hosts to get the encoded UUID string.

 

after above, then you restart this master to see if this issue can be fixed.

 

 

 

 

; ;