Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Kudu is filling up a lot of disk space which is freed only when rebooted

avatar
New Contributor

Hello,

we are running the following Kudu cluster configuration:

1 master
3 tservers
Version: 1.16

The total number of live tablet replicas is 552, as inspected via tserver UI.
Tables are partitioned with both range and hash partitioning, following Kudu guidelines.
Most of tablets are bound to historical tables, i.e. data is continuously inserted without being further modified.
On the other hand, we have some "lookup" tables keeping only the last value of other tables, i.e. a lot of updates is continuously performed on these tablets.

What we noticed from this scenario is that most of the disk space is occupied by Kudu metadata which seems to be unused during normal insert/scan operations. And this metadata seems to have an high impact, occupying around 50% of our total disk space (which is 30GB per instance, right now).

After having rebooted one of the instances, we have noticed that Kudu is executing a sort of cleanup/data compression that allows to free around 40% of disk space, in particular after some operations performed by the "Log Block Manager" (opened 15k log block containers). Do you have any clues on what's happening here? Shouldn't be there some cleanup/compression routine that Kudu executes during runtime?

Also, it seems that, in order to perform such operations on disk data, Kudu has to first load everything into memory, thus causing a very high RAM spike during startup, which is not sustainable, at least in our case. This is causing many troubles on our Kubernetes setup since during startup Kudu fills up to 30GB of memory when it need to perform this compression via log block manager, while during normal usage it requires only 1.5GB. If such memory is not available, we are doomed to a Kudu pod restarting issue, due to the fact that it requires to perform such operations before being able to resume processing. 

At first, I thought that this high disk usage could be due to MVCC snapshots continuously performed by "read_latest" scans on the tablets where we have a lots of updates, but all GBs of metadata freed during reboot make me suspicious about something else going on under the hood...
Please let me know if you need some log and how to provide them. We already have rebooted 2 of the 3 instances, freeing up disk space, so probably there is still some log available for the third instance.

The flag we set are the following ones:
--fs_wal_dir=/mnt/data0/tserver
--fs_metadata_dir=/mnt/data0/metadata
--fs_data_dirs=/mnt/data0/data
--webserver_doc_root=/opt/kudu/www
--rpc_service_queue_length=200
--maintenance_manager_num_threads=1
--webserver_enabled=true
--rpc_encryption disabled
--rpc_authentication disabled
--memory_limit_hard_bytes 1073741824
--block_cache_capacity_mb 256

 Thank you for your help.
Dario

2 REPLIES 2

avatar
Community Manager

@Darcol Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Kudu experts @Asfahan @ChrisGe who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

Thanks for the reply.

I saw this issue that seems to exactly describe my concerns: https://issues.apache.org/jira/plugins/servlet/mobile#issue/KUDU-3318

From what I understood, thanks to that fix the compaction is performed also during runtime. Maybe upgrading Kudu to 1.17 solves the problem? Do you suggest something else?

Thanks,

Dario