for one of our use cases we have about 30TB data compressed in parquet. we are testing now kudu and I'm asking myself how big one tablet should be to get best performance (write and query) out of it. Is there any recommendation eg. 1GB per tablet size? Because what we see is that as bigger the tablet gets as slower seems to be the inserting. However 1GB is way to small as we would need 15 servers (30'000GB / 2000 [max number of tablets per server as written in doc] -> 15) without taking into account the replication. Additionally the doc recommends not to use more than 100 servers...
We are working with kudu 1.6.0.
Thanks in advance
Hi, we have no particular guidance for maximum tablet size. If you are ingesting in random order this will hurt performance, if you can write in sorted primary key order that will help. Otherwise Kudu will constantly be working in the background to merge and compact the rows you wrote into non-overlapping contiguous RowSets.
Another thing you can do if you cannot write in PK sorted order is insert slower, to give Kudu time to "catch up" when reorganizing the data on disk. Your inserts should get faster again after some time.