I'm new to using Apache Kudu and am noticing something that seems odd to me (however it could definitely simply be the design of Kudu which I'm not fully grasping).
This morning, our Data Engineers ran a "purge" job to delete a massive amount of duplicate rows (not primary keys, but essentially duplicates based on our design) from many tables. They essentially put each entire table in a dataframe, dropped actual the table, removed the duplicates from the dataframe and re-inserted the unique rows. For all tables on which we performed this operation, they actually grew (within the report "Total Tablet Size On Disk Across Kudu Replicas") rather than shrank. The actual row counts are much smaller (about a third of the size) but every table has actually grown on disk.
Is this expected behavior? If so, can someone explain why or provide me to a link as I cannot seem to find anything regarding this type of semantic.
Any assistance is greatly appeciated.
Also as an added FYI - we are deleting the data by using a "DELETE FROM <TABLE>" in Impala, prior to reloading the rows.
We also tested simply running a "DELETE FROM <TABLE>" without reloading any rows (and verified the row count is 0) and the table grew. This makes no sense to me. The Impala table is also a "managed" table.
Again, any assistance is greatly appreciated.
Yes, what you're observing is a core part of Kudu's design. Kudu is a full fidelity storage system that preserves all history for a while, so when you delete rows, you're actually piling on more "data" to remember the deletions. Deleted rows age out (default is 15 minutes since the deletion), but must be garbage collected in order to be removed. Garbage collection happens as part of merge compaction, though rowsets may be fully compacted and not undergo merge compaction. See https://github.com/apache/kudu/blob/master/docs/design-docs/tablet-history-gc.md#removing-old-delta-... for more details.
Apache Kudu has an outstanding JIRA to add support for truncating tables: https://issues.apache.org/jira/browse/KUDU-1458. That's not quite your use case as it drops all the data from the table, but it's similar.
Thank you very much for your response, however, this doesn't seem to be the behavior that I'm noticing.
To give you a scenario, we used a table that was 277MB as a test case. We did a "DELETE FROM <TABLE>" and cleared all the rows (verifying in impala after successful deletion that the row count was zero).
I noticed during the deletion process that the table actually grew to 298MB (in the Cloudera Manager charts), which is explained by the documentation you cited, however the table did not shrink after the 15 minutes mark which based on my understanding of said documentation it should have deleted the data after this time period. Instead the table remains at 298MB two days later.
Am I understanding your (and the documentations) explanation incorrectly or do you think something is misconfigured? I also checked the --tablet_history_max_age_sec flag, and it is configured to 900 seconds (or 15 minutes).
Again, your assistance is extremely appeciated.
Thanks again for your response and patience.
I have two additional questions:
1) You mention that there needs to be "overlap of keyspace"... I'm not sure what that actually means... do you have an example or link to an explanation?
2) Is there any way to manually force a merge compaction? Maybe by manually causing "overlap of keyspace"?