Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

kudu Data length checksum does not match

avatar
New Contributor
F0618 21:06:21.008426 28512 tablet_server_main.cc:80] Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: Could not process records in container /var/lib/kudu/tserver/data/332c0788fd874fa19c9f4fb0b862bfa7: Data length checksum does not match: Incorrect checksum in file /var/lib/kudu/tserver/data/332c0788fd874fa19c9f4fb0b862bfa7.metadata at offset 706423: Checksum does not match. Expected: 0. Actual: 1214729159

How to fix the problem after the server is out of power
1 ACCEPTED SOLUTION

avatar
Super Collaborator

You will have to use "dd" to remove the last record of the container file. The latest version of Kudu trunk (after 5.15) contains a --debug option to the "kudu pbc dump" tool that will tell you the offset of the file you should remove from the file, if you compile it.

 

If you can't compile Kudu from source to obtain that tool, then an easy option is to reformat the affected tablet server and start from scratch on that server, if you have additional replicas.

 

Another option is to use a hex editor to figure out the offset where there are is a run of 0s at the end of the file and truncate the 0s off of the file. Make sure to make a backup copy of the container metadata file first.

 

This will be prevented in a future release.

View solution in original post

5 REPLIES 5

avatar
Rising Star

Looks like this is tracked and fixed in KUDU-2260.

avatar
New Contributor
CDH-5.15.0-1.cdh5.15.0.p0.21

 

+ exec /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/kudu/sbin/kudu-tserver --tserver_master_addrs=hdoop1 --flagfile=/run/cloudera-scm-agent/process/345-kudu-KUDU_TSERVER/gflagfile
F0703 13:46:42.503458 30033 tablet_server_main.cc:80] Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: Could not process records in container /var/lib/kudu/tserver/data/data/4655ed673cfc4f4e9d5789342eeec779: Data length checksum does not match: Incorrect checksum in file /var/lib/kudu/tserver/data/data/4655ed673cfc4f4e9d5789342eeec779.metadata at offset 6757669: Checksum does not match. Expected: 0. Actual: 1214729159
*** Check failure stack trace: ***
Wrote minidump to /var/log/kudu/minidumps/kudu-tserver/576b3747-215f-b988-5810abaf-1647ead9.dmp
*** Aborted at 1530596802 (unix time) try "date -d @1530596802" if you are using GNU date ***
PC: @     0x7fa653284c37 gsignal
*** SIGABRT (@0x3da00007551) received by PID 30033 (TID 0x7fa6555e5900) from PID 30033; stack trace: ***
    @     0x7fa6551d4330 (unknown)
    @     0x7fa653284c37 gsignal
    @     0x7fa653288028 abort
    @          0x1b6dee9 (unknown)
    @           0x91469d google::LogMessage::Fail()
    @           0x9166bc google::LogMessage::SendToLog()
    @           0x9141f9 google::LogMessage::Flush()
    @           0x91704f google::LogMessageFatal::~LogMessageFatal()
    @           0x8b455e (unknown)
    @     0x7fa65326ff45 __libc_start_main
    @           0x8b3e64 (unknown)

avatar
Super Collaborator

You will have to use "dd" to remove the last record of the container file. The latest version of Kudu trunk (after 5.15) contains a --debug option to the "kudu pbc dump" tool that will tell you the offset of the file you should remove from the file, if you compile it.

 

If you can't compile Kudu from source to obtain that tool, then an easy option is to reformat the affected tablet server and start from scratch on that server, if you have additional replicas.

 

Another option is to use a hex editor to figure out the offset where there are is a run of 0s at the end of the file and truncate the 0s off of the file. Make sure to make a backup copy of the container metadata file first.

 

This will be prevented in a future release.

avatar
New Contributor

:)  Thank you for your solution

avatar
Super Collaborator

You're welcome. If that worked for you, please mark my response as the answer / solution to your question.