Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

kudu table delete

avatar
New Contributor

I deleted some data from 100+ tables. but i don't any changes in kudu disk usage and kudu soft memory, 

6 REPLIES 6

avatar
Super Collaborator

Hello @sam8686 

 

Thanks for using Cloudera Community. Based on the Post, Your team deleted data from 100+ Tables yet there is no change in Disk Usage. It's likely Compaction must run first before any Space Usage reduction is reflected. There isn't any way to manually run Compaction in Kudu.

 

Review Link [1] Section "4.10 RowSet Compaction" reporting "We take this opportunity to remove deleted rows". 

 

- Smarak

 

[1] https://kudu.apache.org/kudu.pdf

avatar
Expert Contributor

Adding to @smdas 

 

This is one of the kudu limitations :-

"There is no way to run compaction manually, but dropping the table will reclaim the space immediately." 

 

You can verify the size from CM graphs:-

  1. Go to the Kudu service and navigate to the Charts Library tab.
  2. On the left-hand side menu, click Tables to display the list of tables currently stored in Kudu.
  3. Click on a table name to view the default dashboard for that table. The Total Tablet Size On Disk Across Kudu Replicas chart displays the total size of the table on disk using a time-series chart.
    Hovering with your mouse over the line on the chart opens a small pop-up window that displays information about that data point. Click the data stream within the chart to display a larger pop-up window that includes additional information for the table at the point in time where the mouse was clicked.

 

reference :- http://apache.github.io/kudu/docs/known_issues.html#_other_usage_limitations

avatar
New Contributor

I followed the steps for table chart. I didn't find big difference. there is one table with no data but still i can see 400 MB space for that table. I deleted table data using IMAPALA.

avatar
Expert Contributor

Ideally if you have dropped the table then the data should get deleted immediately. The metrics in CM may take some time to reflect, we can verify from backend if the table is actually deleted. 

 

Verify if the table still exist in kudu FS. You can verify this by using kudu ksck command with -tables flags :-

kudu cluster ksck <master_addresses> -tables=<tables>

 

Note if the table created through impala use "impala::db.tablename"

 

If you see the table in ksck then run below command to delete the table from kudu:-

kudu table delete <master_addresses> <table_name> 

 

avatar
New Contributor

I didn't delete whole table. I deleted some data. when i check in kudu tablet server UI, i see lot of TABLET_DATA_TOMBSTONED. In logs there are Processing DeleteTablet for tablet. 

If the data is deleted, why it still caching tablet ?

avatar
Expert Contributor

The disk space occupied by a deleted row is only reclaimable via compaction and  given you have deleted some data and if the space is not reclaimed then probably you are hitting the bug 
https://issues.apache.org/jira/browse/KUDU-1625

 

The jira stands unresolved. However if the goal is to delete the data and reclaim disk space, then you can drop partition (if range partition) in order to reclaim space. 

 

Tombstone tablets have all their data removed from disk and don't consume significant resources. These tablet are necessary for correct operation of kudu.

See - https://docs.cloudera.com/runtime/7.1.0/troubleshooting-kudu/topics/kudu-tombstoned-or-stopped-table...