Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Kudu - deleting data

avatar
New Contributor

Hi I'm currently assessing Kudu to see if it has any advantages for my organisation. The ability to delete data is of particular interest, but i need to understand the delete process, and i can't find the information is any of the documentation i've read.

My question is:  Does the Kudu delete process remove data as part of the delete transaction as per an rdbms database solution would do, or does it mark the data for deletion(removing aaccess to it) like HBase would do.

 

Thanks

2 REPLIES 2

avatar
Mentor
Deleted rows are not erased from disk synchronously with the operation, if I understand your question right - they are 'marked' and only truly erased from disk at the next RowSet compaction.

If you haven't yet, read https://kudu.apache.org/kudu.pdf (the section of interest is (4), "Tablet storage"), and https://github.com/apache/kudu/blob/master/docs/design-docs/tablet.md#mvcc-mutations-in-memrowset.

The latter link also compares Kudu with some other DB systems that use MVCC/etc., including Postgres, which you may find useful.

avatar
Explorer

Hi @Harsh J , 

I just deleted around 80% of my data with "DELETE from table_name where register <= '2018-12-31'"

My disks are pretty full (around 90%). After the deletion nothing happened (about freeing space). I restart Cloudera (Kudu, Impala, HDFS, etc.) and nothing. I add this two lines to Kudu configuration (in "Master Advanced Configuration Snippet (Safety Valve) for gflagfile" and "Tablet Server Advanced Configuration Snippet (Safety Valve) for gflagfile"):

```

unlock_experimental_flags=true
flush_threshold_secs=120

```

After restart Kudu, wait for the 120 secs.. nothing.