Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

KUDU master consensus conflicts

avatar
Rising Star

Hello,

In a CDP 7.1.6 + Cloudera Manager 7.3.1 cluster.

3Masters +3Workers.

I'm getting the error all the time: Corruption: master consensus error: there are master consensus conflicts

 

This is the cluster ksck:

Master Summary
UUID | Address | Status
----------------------------------+--------------------------+---------
5620e4a103894151b7bdee5e436f37d8 | master-2.local | HEALTHY
9cea3b56cc9b4be4846a02c0d89be753 | master-1.local | HEALTHY
a98a1f26d0254293b6e17e9daf8f6ef8 | master-3.local | HEALTHY
All reported replicas are:
A = 9cea3b56cc9b4be4846a02c0d89be753
B = 5620e4a103894151b7bdee5e436f37d8
C = a98a1f26d0254293b6e17e9daf8f6ef8
The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+--------------+--------------+--------------+------------
A | A B C | 10120 | -1 | Yes
B | A B* C | 10120 | -1 | Yes
C | A B* C | 10120 | -1 | Yes

 

It seems the A node is not voting, this is the log output:
W1111 11:12:00.526211 18688 leader_election.cc:334] T 00000000000000000000000000000000 P 9cea3b56cc9b4be4846a02c0d89be753 [CANDIDATE]: Term 10122 pre-election: RPC error from VoteRequest() call to peer 5620e4a103894151b7bdee5e436f37d8 (master-2:7051): Network error: Client connection negotiation failed: client connection to 10.157.136.55:7051: connect: Connection refused (error 111)
W1111 11:12:22.683107 18688 leader_election.cc:334] T 00000000000000000000000000000000 P 9cea3b56cc9b4be4846a02c0d89be753 [CANDIDATE]: Term 10122 pre-election: RPC error from VoteRequest() call to peer 5620e4a103894151b7bdee5e436f37d8 (master-2:7051): Timed out: RequestConsensusVote RPC to 10.157.136.55:7051 timed out after 7.916s (SENT)

 

there is conectivity:

# nc -z -v 10.157.136.55 7051
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.157.136.55:7051.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

and the masters have been restarted several times, and the whole cluster...

 

Any idea to fix this? Thanks!

0 REPLIES 0