I have a problem with kudu on CDH 5.14.3. I have 3 master and 3 tablet servers.
After start, one of 3 tablet server, it downs after a few seconds. I checked kudu logs but I didn't see any important trace to get some more information.
The only trace on fatal and error log was:
Log file created at: 2019/12/12 16:59:30
Running on machine: mymachine.example.com
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F1212 16:59:30.142372 87271 compaction.cc:778] Check failed: pv_delete_redo != nullptr
Any idea? It is possible to resynchronize this tablet server with other two?
If CDH upgrade is not possible immediately, setting the "-log_min_segments_to_retain=2" will only prevent the further corruption.
To do this, go to the Kudu configuration page in Cloudera Manager and add "-log_min_segments_to_retain=2" to the Kudu Service Advanced Configuration Snippet (Safety Valve) for gflagfile option.
Note: When CDH upgraded to a version containing a fix, remove -log_min_segments_to_retain setting.
However, if the tablet failed to start after setting "-log_min_segments_to_retain=2" , then we have to find the corrupt tablet on the affected TS and delete it in order to bring the TServer up.