Created 11-17-2019 11:01 PM
Hi All,
I am getting the below error on one of my Kudu tablet server, I have restarted table server services on this host yet when I check them I continue to get this error
W1118 19:43:31.815698 33067 consensus_peers.cc:435] T 3292e490cf4843d994a45f9a4c7782c0 P cc36320dd81646d081a24203751c2a6a -> Peer 164c8bcafccc4fd0adfb6dfe7a2ff60e (MYSERVER.com:7050): Couldn't send request to peer 164c8bcafccc4fd0adfb6dfe7a2ff60e for tablet 3292e490cf4843d994a45f9a4c7782c0. Error code: TABLET_NOT_RUNNING (12). Status: Illegal state: Tablet not RUNNING: INITIALIZED. Retrying in the next heartbeat period. Already tried 389 times.
Any help is much appreciated
Regards
Amn
Created 11-17-2019 11:08 PM
That message indicates that the Kudu tserver is in the process of bootstrapping all of its tablet replicas. It hasn't gotten to tablet 3292e490cf4843d994a45f9a4c7782c0 yet though, but it should soon. If you look at MYSERVER.com:8050/tablets, you should be able to see the current state of the tablet replicas on that tablet server (INITIALIZED, BOOTSTRAPPING, RUNNING, etc.).
The bootstrapping process being slow can indicate a number of things, like there being a large number of tablet replicas on that particular tablet server (in which case you might want to rebalance the cluster using the rebalancer tool), or that the WAL disk is slow (in which case you might want to use a faster disk for the -fs_wal_dir, since the disk is shared among all tablet replicas).
Hope this helped!
Created 11-17-2019 11:29 PM
Hi Awong,
Thanks for the quick reply, following is what I see, based on the screenshot what I understand is that the bootstrap process is completed, as it says 100%, also when I click on Details > toggle I see Under Last Status as either -Bootstrap complete. or No bootstrap required, opened a new log, for the corresponding Table Name.
When I check the logs I still see the same previous error. anything else I can check??
Regards
Amn
Created 11-17-2019 11:32 PM
You should check your other tablet servers. Those logs may be indicating that some of the tablet replicas are trying to communicate with other replicas on other servers, but the replicas on other servers are still bootstrapping.
Or are all of your tablet servers done bootstrapping?
Created 11-17-2019 11:42 PM
Checked all 9 tablet servers all are done bootstrapping, I see the same results as I posted in the previous screenshot although the numbers are different, but its all at 100%
Regards
Amn
Created 11-17-2019 11:51 PM
Hm, that's pretty odd. And the messages are still coming in? These aren't old messages?
If you run `kudu cluster ksck` on your cluster, what does it say about the health of that tablet?
Created 11-18-2019 12:51 AM
I see all my Tablet Servers Healthy and the Summary by Table also shows them Healthy. Nothing in 'Recovering / Under-Replicated / Unavailable'