Created on 06-18-2018 09:42 AM - edited 09-16-2022 06:21 AM
Created 07-05-2018 10:44 AM
You will have to use "dd" to remove the last record of the container file. The latest version of Kudu trunk (after 5.15) contains a --debug option to the "kudu pbc dump" tool that will tell you the offset of the file you should remove from the file, if you compile it.
If you can't compile Kudu from source to obtain that tool, then an easy option is to reformat the affected tablet server and start from scratch on that server, if you have additional replicas.
Another option is to use a hex editor to figure out the offset where there are is a run of 0s at the end of the file and truncate the 0s off of the file. Make sure to make a backup copy of the container metadata file first.
This will be prevented in a future release.
Created 06-25-2018 02:29 PM
Looks like this is tracked and fixed in KUDU-2260.
Created 07-02-2018 10:50 PM
CDH-5.15.0-1.cdh5.15.0.p0.21
+ exec /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/kudu/sbin/kudu-tserver --tserver_master_addrs=hdoop1 --flagfile=/run/cloudera-scm-agent/process/345-kudu-KUDU_TSERVER/gflagfile F0703 13:46:42.503458 30033 tablet_server_main.cc:80] Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: Could not process records in container /var/lib/kudu/tserver/data/data/4655ed673cfc4f4e9d5789342eeec779: Data length checksum does not match: Incorrect checksum in file /var/lib/kudu/tserver/data/data/4655ed673cfc4f4e9d5789342eeec779.metadata at offset 6757669: Checksum does not match. Expected: 0. Actual: 1214729159 *** Check failure stack trace: *** Wrote minidump to /var/log/kudu/minidumps/kudu-tserver/576b3747-215f-b988-5810abaf-1647ead9.dmp *** Aborted at 1530596802 (unix time) try "date -d @1530596802" if you are using GNU date *** PC: @ 0x7fa653284c37 gsignal *** SIGABRT (@0x3da00007551) received by PID 30033 (TID 0x7fa6555e5900) from PID 30033; stack trace: *** @ 0x7fa6551d4330 (unknown) @ 0x7fa653284c37 gsignal @ 0x7fa653288028 abort @ 0x1b6dee9 (unknown) @ 0x91469d google::LogMessage::Fail() @ 0x9166bc google::LogMessage::SendToLog() @ 0x9141f9 google::LogMessage::Flush() @ 0x91704f google::LogMessageFatal::~LogMessageFatal() @ 0x8b455e (unknown) @ 0x7fa65326ff45 __libc_start_main @ 0x8b3e64 (unknown)
Created 07-05-2018 10:44 AM
You will have to use "dd" to remove the last record of the container file. The latest version of Kudu trunk (after 5.15) contains a --debug option to the "kudu pbc dump" tool that will tell you the offset of the file you should remove from the file, if you compile it.
If you can't compile Kudu from source to obtain that tool, then an easy option is to reformat the affected tablet server and start from scratch on that server, if you have additional replicas.
Another option is to use a hex editor to figure out the offset where there are is a run of 0s at the end of the file and truncate the 0s off of the file. Make sure to make a backup copy of the container metadata file first.
This will be prevented in a future release.
Created 07-08-2018 11:50 PM
:) Thank you for your solution
Created 07-09-2018 11:41 AM
You're welcome. If that worked for you, please mark my response as the answer / solution to your question.