We are running cloudera manager
Version: Cloudera Express 5.13.0 (#55 built by jenkins on 20171002-1719 git: bd657e597e6743c458ee2c9aabe808b7c972981c)
And our kudu cluster keeps getting unhealthy randomly with one out of the three kudu nodes getting down.
Error it throws on startup -
F0111 15:53:15.641548 98757 tablet_bootstrap.cc:884] Check failed: _s.ok() Bad status: Invalid argument: Tried to update clock beyond the max. error.
We are running ntp on all our nodes and is properly synced. We are running ubuntu 14.04. Kudu keeps getting unhealthy randomly while running. No configuration change or anything. Sometimes restart do works but fails most of the time. We have been trying to figure out with the help of logs and internet knowledgebase but to no luck yet.
I see and will be testing out the latest version. Do you have any idea if the following issue is related -
F0123 14:20:53.231120 2879 tablet_server_main.cc:80] Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: Could not process records in container /dw/kudu/tablet/data/data/f73861e4f85a4688bdf940c3a7420e51: Data length checksum does not match: Incorrect checksum in file /dw/kudu/tablet/data/data/f73861e4f85a4688bdf940c3a7420e51.metadata at offset 902508: Checksum does not match. Expected: 0. Actual: 1214729159
This also is happening to us these days when kudu tablet server just crashes and there is no ERROR or any other appropriate log but when we restart, it fails with the above error.