Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

kudu tablet server failed to start

New Contributor

Hi,

 

I have a problem with kudu on CDH 5.14.3. I have 3 master and 3 tablet servers.

After start, one of 3 tablet server, it downs after a few seconds. I checked kudu logs but I didn't see any important trace to get some more information.

 

The only trace on fatal and error log was:

 

Log file created at: 2019/12/12 16:59:30
Running on machine: mymachine.example.com
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F1212 16:59:30.142372 87271 compaction.cc:778] Check failed: pv_delete_redo != nullptr

 

Any idea? It is possible to resynchronize this tablet server with other two?

 

Thanks

2 REPLIES 2

Rising Star
Hi, It seems you hit KUDU-2233 (https://issues.apache.org/jira/browse/KUDU-2233) which is a nasty bug in Kudu 1.6 reported here - https://docs.cloudera.com/documentation/enterprise/release-notes/topics/kudu_known_issues.html#tsb-2... (please refer the complete stack trace in kudu tserver stderr log) KUDU-2233 is fixed in kudu 1.7.0 which is available from CDH 5.15.0. You may consider upgrading the CDH version to CDH 5.15.0 or later. Refer the release note here - https://docs.cloudera.com/documentation/enterprise/release-notes/topics/kudu_release_notes.html#reln... Regards, PDas

Rising Star

If CDH upgrade is not possible immediately, setting the "-log_min_segments_to_retain=2" will only prevent the further corruption. 

 

To do this, go to the Kudu configuration page in Cloudera Manager and add "-log_min_segments_to_retain=2" to the Kudu Service Advanced Configuration Snippet (Safety Valve) for gflagfile option.

Note: When CDH upgraded to a version containing a fix, remove -log_min_segments_to_retain setting.

 

However, if the tablet failed to start after setting  "-log_min_segments_to_retain=2" , then we have to find the corrupt tablet on the affected TS and delete it in order to bring the TServer up.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.