Support Questions

Juanes · ‎11-12-2022

Hello,

In my 3Masters cluster, one Kudu Master is starting and stopping all the time, this is the Log detail from Cloudera Manager:

Time Log Level Source Log Message
10:14:41.417 AM WARN cc:288
Found duplicates in --master_addresses: the unique set of addresses is Master1:7051, Master2:7051, Master3:7051
10:15:11.823 AM WARN cc:254
Call kudu.consensus.ConsensusService.RequestConsensusVote from 10.157.136.55:55402 (request call id 0) took 4542 ms (4.54 s). Client timeout 1775 ms (1.78 s)
10:15:11.823 AM WARN cc:254
Call kudu.consensus.ConsensusService.RequestConsensusVote from 10.157.136.37:59796 (request call id 0) took 30215 ms (30.2 s). Client timeout 9654 ms (9.65 s)
10:15:11.823 AM WARN cc:260
Trace:
1112 10:15:07.281146 (+ 0us) service_pool.cc:169] Inserting onto call queue
1112 10:15:07.281169 (+ 23us) service_pool.cc:228] Handling call
1112 10:15:11.823245 (+4542076us) inbound_call.cc:171] Queueing success response
Metrics: {"spinlock_wait_cycles":384}
10:15:11.823 AM WARN cc:260
Trace:
1112 10:14:41.607787 (+ 0us) service_pool.cc:169] Inserting onto call queue
1112 10:14:41.607839 (+ 52us) service_pool.cc:228] Handling call
1112 10:15:11.823242 (+30215403us) inbound_call.cc:171] Queueing success response
Metrics: {}
10:15:11.823 AM WARN cc:254
Call kudu.consensus.ConsensusService.RequestConsensusVote from 10.157.136.55:55402 (request call id 1) took 4536 ms (4.54 s). Client timeout 1955 ms (1.96 s)
10:15:11.823 AM WARN cc:260
Trace:
1112 10:15:07.286988 (+ 0us) service_pool.cc:169] Inserting onto call queue
1112 10:15:07.287025 (+ 37us) service_pool.cc:228] Handling call
1112 10:15:11.823244 (+4536219us) inbound_call.cc:171] Queueing success response
Metrics: {}

What does it means???

why is this so unconsistent?

Juanes · ‎11-23-2022

Hello,

I got the fix for this case, maybe this could help anyone having the same kudu Master consensus issue than me.

Master1 is not voting:

The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+--------------+--------------+--------------+------------
Master1 A | A B C | 12026 | -1 | Yes
Master2 B | A B C* | 12026 | -1 | Yes
Master3 C | A B C* | 12026 | -1 | Yes

the workarround is:

A)stop the problematic Master and run the below command on Problematic master
B)sudo -u kudu kudu local_replica delete --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 -clean_unsafe
C) Please check the kudu leader master with webUI
a98a1f26d0254293b6e17e9daf8f6ef8 822fcc68eff448269c9200a8c4c2ecc8 LEADER 2022-11-22 07:18:21 GMT
rpc_addresses { host: "sdzw-hpas-35" port: 7051 } http_addresses { host: "sdzw-hpas-35" port: 8051 } software_version: "kudu 1.13.0.7.1.6.0-297 (rev 9323384dbd925202032a965e955979d6d2f6acb0)" https_enabled: false
D)sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/wal/kudu/wal --fs_data_dirs=/wal/kudu/data 00000000000000000000000000000000 <active_leader_fqdn>:7051
# sudo -u kudu /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/../lib/kudu/bin/kudu local_replica copy_from_remote --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 sdzw-hpas-35.nrtsz.local:7051
E)stop remaining two masters
F)then start all the three masters.

View solution in original post

Juanes · ‎11-23-2022

Hello,

I got the fix for this case, maybe this could help anyone having the same kudu Master consensus issue than me.

Master1 is not voting:

The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+--------------+--------------+--------------+------------
Master1 A | A B C | 12026 | -1 | Yes
Master2 B | A B C* | 12026 | -1 | Yes
Master3 C | A B C* | 12026 | -1 | Yes

the workarround is:

A)stop the problematic Master and run the below command on Problematic master
B)sudo -u kudu kudu local_replica delete --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 -clean_unsafe
C) Please check the kudu leader master with webUI
a98a1f26d0254293b6e17e9daf8f6ef8 822fcc68eff448269c9200a8c4c2ecc8 LEADER 2022-11-22 07:18:21 GMT
rpc_addresses { host: "sdzw-hpas-35" port: 7051 } http_addresses { host: "sdzw-hpas-35" port: 8051 } software_version: "kudu 1.13.0.7.1.6.0-297 (rev 9323384dbd925202032a965e955979d6d2f6acb0)" https_enabled: false
D)sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/wal/kudu/wal --fs_data_dirs=/wal/kudu/data 00000000000000000000000000000000 <active_leader_fqdn>:7051
# sudo -u kudu /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/../lib/kudu/bin/kudu local_replica copy_from_remote --fs_wal_dir=/var/kudu/master --fs_data_dirs=/var/kudu/master 00000000000000000000000000000000 sdzw-hpas-35.nrtsz.local:7051
E)stop remaining two masters
F)then start all the three masters.

Cloudera Community

Support Questions

Unstable Kudu Master

Kudu master consensus error

Kudu Master Directories

How to integrate Spark3 with Kudu

Adding a master server in kudu

Kudu Tablet taking too long to sync with Kudu Mast...

How to optimize IMPALA/KUDU queries

Comparison : Kudu Copy Command vs Spark backup uti...

[Kudu] Master consensus error

Configure two Kerberos KDCs as a Master/Slave

HBASE Master initialization issue