Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

kudu tablet server failed to start

kudu tablet server failed to start

New Contributor

Hi,

 

I have a problem with kudu on CDH 5.14.3. I have 3 master and 3 tablet servers.

After start, one of 3 tablet server, it downs after a few seconds. I checked kudu logs but I didn't see any important trace to get some more information.

 

The only trace on fatal and error log was:

 

Log file created at: 2019/12/12 16:59:30
Running on machine: mymachine.example.com
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F1212 16:59:30.142372 87271 compaction.cc:778] Check failed: pv_delete_redo != nullptr

 

Any idea? It is possible to resynchronize this tablet server with other two?

 

Thanks

2 REPLIES 2
Highlighted

Re: kudu tablet server failed to start

Cloudera Employee
Hi, It seems you hit KUDU-2233 (https://issues.apache.org/jira/browse/KUDU-2233) which is a nasty bug in Kudu 1.6 reported here - https://docs.cloudera.com/documentation/enterprise/release-notes/topics/kudu_known_issues.html#tsb-2... (please refer the complete stack trace in kudu tserver stderr log) KUDU-2233 is fixed in kudu 1.7.0 which is available from CDH 5.15.0. You may consider upgrading the CDH version to CDH 5.15.0 or later. Refer the release note here - https://docs.cloudera.com/documentation/enterprise/release-notes/topics/kudu_release_notes.html#reln... Regards, PDas
Highlighted

Re: kudu tablet server failed to start

Cloudera Employee

If CDH upgrade is not possible immediately, setting the "-log_min_segments_to_retain=2" will only prevent the further corruption. 

 

To do this, go to the Kudu configuration page in Cloudera Manager and add "-log_min_segments_to_retain=2" to the Kudu Service Advanced Configuration Snippet (Safety Valve) for gflagfile option.

Note: When CDH upgraded to a version containing a fix, remove -log_min_segments_to_retain setting.

 

However, if the tablet failed to start after setting  "-log_min_segments_to_retain=2" , then we have to find the corrupt tablet on the affected TS and delete it in order to bring the TServer up.

Don't have an account?
Coming from Hortonworks? Activate your account here