Created 11-12-2022 02:25 AM
Hello,
In my 3Masters cluster, one Kudu Master is starting and stopping all the time, this is the Log detail from Cloudera Manager:
Time Log Level Source Log Message
10:14:41.417 AM WARN cc:288
Found duplicates in --master_addresses: the unique set of addresses is Master1:7051, Master2:7051, Master3:7051
10:15:11.823 AM WARN cc:254
Call kudu.consensus.ConsensusService.RequestConsensusVote from 10.157.136.55:55402 (request call id 0) took 4542 ms (4.54 s). Client timeout 1775 ms (1.78 s)
10:15:11.823 AM WARN cc:254
Call kudu.consensus.ConsensusService.RequestConsensusVote from 10.157.136.37:59796 (request call id 0) took 30215 ms (30.2 s). Client timeout 9654 ms (9.65 s)
10:15:11.823 AM WARN cc:260
Trace:
1112 10:15:07.281146 (+ 0us) service_pool.cc:169] Inserting onto call queue
1112 10:15:07.281169 (+ 23us) service_pool.cc:228] Handling call
1112 10:15:11.823245 (+4542076us) inbound_call.cc:171] Queueing success response
Metrics: {"spinlock_wait_cycles":384}
10:15:11.823 AM WARN cc:260
Trace:
1112 10:14:41.607787 (+ 0us) service_pool.cc:169] Inserting onto call queue
1112 10:14:41.607839 (+ 52us) service_pool.cc:228] Handling call
1112 10:15:11.823242 (+30215403us) inbound_call.cc:171] Queueing success response
Metrics: {}
10:15:11.823 AM WARN cc:254
Call kudu.consensus.ConsensusService.RequestConsensusVote from 10.157.136.55:55402 (request call id 1) took 4536 ms (4.54 s). Client timeout 1955 ms (1.96 s)
10:15:11.823 AM WARN cc:260
Trace:
1112 10:15:07.286988 (+ 0us) service_pool.cc:169] Inserting onto call queue
1112 10:15:07.287025 (+ 37us) service_pool.cc:228] Handling call
1112 10:15:11.823244 (+4536219us) inbound_call.cc:171] Queueing success response
Metrics: {}
What does it means???
why is this so unconsistent?
Created 11-23-2022 01:09 AM
Hello,
I got the fix for this case, maybe this could help anyone having the same kudu Master consensus issue than me.
Master1 is not voting:
The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+--------------+--------------+--------------+------------
Master1 A | A B C | 12026 | -1 | Yes
Master2 B | A B C* | 12026 | -1 | Yes
Master3 C | A B C* | 12026 | -1 | Yes
the workarround is:
A)stop the problematic Master and run the below command on Problematic master
B)sudo -u kudu kudu local_replica delete --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 -clean_unsafe
C) Please check the kudu leader master with webUI
a98a1f26d0254293b6e17e9daf8f6ef8 822fcc68eff448269c9200a8c4c2ecc8 LEADER 2022-11-22 07:18:21 GMT
rpc_addresses { host: "sdzw-hpas-35" port: 7051 } http_addresses { host: "sdzw-hpas-35" port: 8051 } software_version: "kudu 1.13.0.7.1.6.0-297 (rev 9323384dbd925202032a965e955979d6d2f6acb0)" https_enabled: false
D)sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/wal/kudu/wal --fs_data_dirs=/wal/kudu/data 00000000000000000000000000000000 <active_leader_fqdn>:7051
# sudo -u kudu /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/../lib/kudu/bin/kudu local_replica copy_from_remote --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 sdzw-hpas-35.nrtsz.local:7051
E)stop remaining two masters
F)then start all the three masters.
Created 11-23-2022 01:09 AM
Hello,
I got the fix for this case, maybe this could help anyone having the same kudu Master consensus issue than me.
Master1 is not voting:
The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+--------------+--------------+--------------+------------
Master1 A | A B C | 12026 | -1 | Yes
Master2 B | A B C* | 12026 | -1 | Yes
Master3 C | A B C* | 12026 | -1 | Yes
the workarround is:
A)stop the problematic Master and run the below command on Problematic master
B)sudo -u kudu kudu local_replica delete --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 -clean_unsafe
C) Please check the kudu leader master with webUI
a98a1f26d0254293b6e17e9daf8f6ef8 822fcc68eff448269c9200a8c4c2ecc8 LEADER 2022-11-22 07:18:21 GMT
rpc_addresses { host: "sdzw-hpas-35" port: 7051 } http_addresses { host: "sdzw-hpas-35" port: 8051 } software_version: "kudu 1.13.0.7.1.6.0-297 (rev 9323384dbd925202032a965e955979d6d2f6acb0)" https_enabled: false
D)sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/wal/kudu/wal --fs_data_dirs=/wal/kudu/data 00000000000000000000000000000000 <active_leader_fqdn>:7051
# sudo -u kudu /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/../lib/kudu/bin/kudu local_replica copy_from_remote --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 sdzw-hpas-35.nrtsz.local:7051
E)stop remaining two masters
F)then start all the three masters.