Support Questions

Find answers, ask questions, and share your expertise

what recommended size of a kudu table

avatar
Expert Contributor

hello cloudera community,

 

we check in the graph "total_kudu_on_disk_size_across_kudu_replicas" and there are tables with 500GB

 

with that, we need to know what is the recommended size for a kudu table?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Correct, 50GB is the limit, reccomended: 10GB 🙂

View solution in original post

8 REPLIES 8

avatar
Master Collaborator

Hi,

I did not get the recommended size of kudu table, But there is a limitation like what is amount of data per tablet, how many tablets per table etc.. Please refer the below documentation:

https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/kudu_limitations.html#scaling_limits

 

Regards,

Chethan YM

avatar
Expert Contributor

hi @ChethanYM , 

 

I read this documentation, but the doubt is about the tablet and table

 

if looking at the graph in cloudera and seeing tables above 50GB it would be out of the recommended

avatar
Expert Contributor

Hi @yagoaparecidoti 

Here the key is not the table size but the tablets.

One table of 50GB could have 50 tablets, then each tablet of 1GB (that's good)

or

One table of 50GB could have 2 tablets, then each tablet of 25GB (that's no so good: The recommended target size for tablets is under 10 GiB)

you can take a look in your Kudu Master UI: http://Master:8051/tables and look for your tables and partitions (tablets).

I'm using this chart to see the kudu table sizing in the clart builder:

select total_kudu_on_disk_size_across_kudu_replicas where category=KUDU_TABLE

Juanes_0-1685440308073.png

 

avatar
Expert Contributor

hi @Juanes , 

 

great!

 

So, let's assume I have a 500GB table and that table was created with 240 tablets, would that value be within the recommended range?

 

other point!

 

I'm using the following calculates as an example:

 

DATA_SIZE = (value taken from the graph "total_kudu_on_disk_size_across_kudu_replicas")

NUM_REPLICAS = RF * Total Tablets (value taken from the ksck command)

TABLET_SIZE = DATA_SIZE / NUM_REPLICAS

 

DATA_SIZE = 147G (converted to bytes, getting "157840048128")

NUM_REPLICAS = 3 * 240 = 360

 

Name | RF | State | Total Tablets | Healthy | Recovering | Underreplicated | not available
impala::DATABASE01.TABLE01 | 3 | HEALTHY | 240 | 240 | 0 | 0 | 0

 

TABLET_SIZE = 157840048128 / 720 = 219222289 (which equals 2.04GB)

 

the end result was 2.04GB, does that mean each tablet has 2.04GB?

avatar
Expert Contributor

Hi again,

you should be able to see the tablet size of every table (in the Kudu Tablet server UI):

http://KUDUTABLET1:8050/tablets

Then go to "Tablets" in the top menu and then you can search in the empty box your desired table.

you will see all tablets (blocks) and some interesting information like

Tablet ID, Partition, State, On-disk size and RaftConfig (Master)

Then you can see how the tablets are more or less similar size.

 

avatar
Expert Contributor

hi @Juanes

 

accessing a tablet server on port 8050 and checking the tablets, found more than 30 tablet id with the same name, each tablet id was using 4.4GB disk size

 

in this case, according to cloudera's documentation, a tablet can have a maximum of 50GB, the size of the tablet id that was found is within the recommended range, right?

avatar
Expert Contributor

Correct, 50GB is the limit, reccomended: 10GB 🙂

avatar
Expert Contributor

OK! @Juanes 😉


thanks for the clarification.