Support Questions

Find answers, ask questions, and share your expertise

Is it possible to increase the recommended storage size in a Kudu tablet server?

New Contributor

Hi everyone

 

I'm new here, and I'm excited about the solutions of Cloudera, but I have some doubts when considering the use of these solutions in a project that we are facing.

The main one would be, what problems could have to increase the storage size in the tablet servers beyond the recommended 8 TB?

 

In the dimensioning of the cluster, we find a limitation relative to the maximum recommended storage size per tablet server (8 TB). This,  due to the large size expected for the data lake to be exploited makes the project unviable, due to the number of nodes that we would have to use.
 
On the official Cloudera website regarding the limitations:
 
Recommended maximum number of tablet servers is 100.
Recommended maximum number of masters is 3.
Recommended maximum amount of stored data, post-replication and post-compression, per tablet server, is 8TB.
Recommended number of tablets per tablet server is 1000 (post-replication) with 2000 being the maximum number of tablets allowed per tablet server.
The maximum number of tablets per table for each tablet server is 60, post-replication, at table-creation time.
 
 
I saw a comment, there are some installations  with much more information stored by a table:
 
> 3. The total amount of stored data per table, pre-replication = Amount of
> stored data per tablet * Maximum number of tablets per table for each
> tablet server pre-replication * Maximum number of tablet servers = 4 GB *
> 20 * 100 = 8TB
>
 
Per above, this is not really the case. For example, on one cluster at
Cloudera which runs an internal workload, we have one table that is 82TB
and another which is 46TB. I've seen much larger tables in some user
installations as well.
 
 
I confuse a little with the terminology, .. does it mean that it is possible to store in a single tablet server, more than 8 TB because it can be hosted on the tablet server tables up to 46 TB for example?
Or does it mean that it is possible to store more than 8 TB of data on a tablet server?
 
Would the performance be affected a lot?
 
Would it be problematic to store more than 2000 tablets on a tablet server, or is it simply not allowed?
 
 
Many thanks and best regards
2 REPLIES 2

Contributor

Hi DanAdan,

 

It is possible to store more than the recommendations amount of data on a tablet server. We have such recommendations because that is what have been well tested. The same for storing more than 2000 tablets on a tablet server.  Some performance degradation could be 1) the server restart time gets longer, as on disk data grows. 2) as tablets accrue more data blocks, their superblocks become larger, raising the minimum amount of I/O for any operation that rewrites a superblock (such as a flush or compaction). 3) the tablet copy protocol used in rereplication tries to copy the entire superblock in one RPC message; if the superblock is too large, it'll run up against the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).

 

 

Master Collaborator
I dont have the latest version of Kudu (1.7) but what I can tell, TEST IT REALLY WELL before you decide to put such amount of data into that. I have a cluster with only <1000 tablets per tablet server and a few TB of data, and it is running on "edge", i.e. sometimes hitting OOM, perf degradation, DRS fragmentation etc. I think you should choose the latest Kudu, because lot of issues are resolved there, but still are very important open issues, which will be solved in the future..