Support Questions

Find answers, ask questions, and share your expertise

Emmanuel Katto Dubai : Issue with Tablet Server Having Two UUIDs After Restart and WAL Directory Loss

avatar

Hi Everyone, I am Emmanuel Katto from Dubai, United Arab Emirates (UAE) We encountered an issue on our production Kudu cluster where the tablet server failed due to a disk failure, and the WAL catalog was lost. After installing a new disk and clearing the data directory following the Kudu documentation (Rebuilding Kudu), we restarted the failing tablet server. However, after restarting, we noticed that the kudu ksck command showed two tablet servers with different UUIDs for the same server, and one of them had a "WRONG SERVER_UUID" status.

Questions:

  • What could be the cause of this error?
  • How can we avoid this issue in the future?
  • Is there a way to resolve this problem without restarting the master server?

We also found the kudu tserver unregister command, which appears to be used for removing tablet servers with incorrect UUIDs, but we didn't find this mentioned in the official documentation.

 

Regards

Emmanuel Katto

0 REPLIES 0