Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Kudu T-Server Error

avatar
Rising Star

Hi All,

 

I am getting the below error on one of my Kudu tablet server, I have restarted table server services on this host yet when I check them I continue to get this error 

W1118 19:43:31.815698 33067 consensus_peers.cc:435] T 3292e490cf4843d994a45f9a4c7782c0 P cc36320dd81646d081a24203751c2a6a -> Peer 164c8bcafccc4fd0adfb6dfe7a2ff60e (MYSERVER.com:7050): Couldn't send request to peer 164c8bcafccc4fd0adfb6dfe7a2ff60e for tablet 3292e490cf4843d994a45f9a4c7782c0. Error code: TABLET_NOT_RUNNING (12). Status: Illegal state: Tablet not RUNNING: INITIALIZED. Retrying in the next heartbeat period. Already tried 389 times.

Any help is much  appreciated

 

Regards

Amn 

6 REPLIES 6

avatar
Rising Star

That message indicates that the Kudu tserver is in the process of bootstrapping all of its tablet replicas. It hasn't gotten to tablet 3292e490cf4843d994a45f9a4c7782c0 yet though, but it should soon. If you look at MYSERVER.com:8050/tablets, you should be able to see the current state of the tablet replicas on that tablet server (INITIALIZED, BOOTSTRAPPING, RUNNING, etc.).

 

The bootstrapping process being slow can indicate a number of things, like there being a large number of tablet replicas on that particular tablet server (in which case you might want to rebalance the cluster using the rebalancer tool), or that the WAL disk is slow (in which case you might want to use a faster disk for the -fs_wal_dir, since the disk is shared among all tablet replicas).

 

Hope this helped!

avatar
Rising Star

Hi Awong,

 

Thanks for the quick reply, following is what I see, based on the screenshot what I understand is that the bootstrap process is completed, as it says 100%, also when I click on Details > toggle I see Under Last Status as either -Bootstrap complete. or No bootstrap required, opened a new log, for the corresponding Table Name.

When I check the logs I still see the same previous error. anything else I can check??Capture.JPG

Regards

Amn

avatar
Rising Star

You should check your other tablet servers. Those logs may be indicating that some of the tablet replicas are trying to communicate with other replicas on other servers, but the replicas on other servers are still bootstrapping.

 

Or are all of your tablet servers done bootstrapping?

avatar
Rising Star

@awong 

 

Checked all 9 tablet servers all are done bootstrapping, I see the same results as I posted in the previous screenshot although the numbers are different, but its all at  100%

 

Regards

Amn

avatar
Rising Star

Hm, that's pretty odd. And the messages are still coming in? These aren't old messages?

 

If you run `kudu cluster ksck` on your cluster, what does it say about the health of that tablet?

avatar
Rising Star

I see all my Tablet Servers Healthy and the Summary by Table also shows them Healthy. Nothing in 'Recovering / Under-Replicated / Unavailable'