Hello,
In a CDP 7.1.6 + Cloudera Manager 7.3.1 cluster.
3Masters +3Workers.
I'm getting the error all the time: Corruption: master consensus error: there are master consensus conflicts
This is the cluster ksck:
Master Summary
UUID | Address | Status
----------------------------------+--------------------------+---------
5620e4a103894151b7bdee5e436f37d8 | master-2.local | HEALTHY
9cea3b56cc9b4be4846a02c0d89be753 | master-1.local | HEALTHY
a98a1f26d0254293b6e17e9daf8f6ef8 | master-3.local | HEALTHY
All reported replicas are:
A = 9cea3b56cc9b4be4846a02c0d89be753
B = 5620e4a103894151b7bdee5e436f37d8
C = a98a1f26d0254293b6e17e9daf8f6ef8
The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+--------------+--------------+--------------+------------
A | A B C | 10120 | -1 | Yes
B | A B* C | 10120 | -1 | Yes
C | A B* C | 10120 | -1 | Yes
It seems the A node is not voting, this is the log output:
W1111 11:12:00.526211 18688 leader_election.cc:334] T 00000000000000000000000000000000 P 9cea3b56cc9b4be4846a02c0d89be753 [CANDIDATE]: Term 10122 pre-election: RPC error from VoteRequest() call to peer 5620e4a103894151b7bdee5e436f37d8 (master-2:7051): Network error: Client connection negotiation failed: client connection to 10.157.136.55:7051: connect: Connection refused (error 111)
W1111 11:12:22.683107 18688 leader_election.cc:334] T 00000000000000000000000000000000 P 9cea3b56cc9b4be4846a02c0d89be753 [CANDIDATE]: Term 10122 pre-election: RPC error from VoteRequest() call to peer 5620e4a103894151b7bdee5e436f37d8 (master-2:7051): Timed out: RequestConsensusVote RPC to 10.157.136.55:7051 timed out after 7.916s (SENT)
there is conectivity:
# nc -z -v 10.157.136.55 7051
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.157.136.55:7051.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
and the masters have been restarted several times, and the whole cluster...
Any idea to fix this? Thanks!