05-07-2018 02:57 AM
We're using CDH 5.12.4.
Kudu user's nofile limit was set initially to 32k. The values were normal for some time, but gradually it raises up.
On tables creation the file descriptors significantly raises up, reaching the critical values and causing Kudu instability/failures.
The "Maximum Process File Descriptors" property was raised up twice and currently it seems enough with minimal load for now.
However, later on we're planning to introduce much more load and therefore are very interested in recommended values.
Could you share what would be the recommendations for the file descriptors limits for Kudu, e.g. "magic" formulas depending on the load/size/number of nodes.
05-07-2018 06:33 AM - edited 05-07-2018 06:48 AM
7 Tablet Servers (plans are to double the number of nodes in the nearest future)
Kudu comes from CDH: kudu 1.6.0-cdh5.14.2
05-07-2018 07:20 AM - edited 05-07-2018 07:26 AM
This property is left default, i.e.
The properties are left default, except for 11:
- 4 related to dirs (fs_wal_dirs and fs_data_dirs for master and tablet servers)
- Kudu Tablet Server Hard Memory Limit = 20G
- Kudu Tablet Server Block Cache Capacity = 5G
- Automatically Restart Process
- Process Swap Memory Thresholds
- Maximum Process File Descriptors = 65k
- Cgroup CPU Shares=5000
- Cgroup I/O Weight=512
05-07-2018 06:07 PM - edited 05-07-2018 06:08 PM
Please make sure you are not exceeding the scale limits for the version you are running. For Kudu 1.4.0 that can be found at https://kudu.apache.org/releases/1.4.0/docs/known_issues.html
05-09-2018 07:50 AM - edited 05-09-2018 07:52 AM
Sorry for asking too many questions i am trying to understand the situtation .
idealy speaking -1 should be suffice because it uses 40 % of the resource .
What does your kudu table look like
hash and range partitioning ? ENCODING Attribute ?
05-11-2018 01:39 AM - edited 05-11-2018 01:39 AM
"-1" is an option, of course, but it would be good to have better understanding is it normal behaviour, why does it grow and how large might it raise. As well as having some limits (in the case of sudden grow) would be useful to limit the effect on other cluster services.
Hash Partitions - majority 50, a couple tables - 128.
Range partitioning unbounded.
Encoding attribute is not used in table creation (auto_encoding).
Block size is not used in table creation (default is taken)
05-11-2018 03:05 AM
mpercy, thanks for the link.
Currently the number of tablets per server at some tservers is exceeding recommended up to 10 times.
Others criteria are fine.
Does it mean the default limit number of file handlers should be increased 10 times, e.g. from 32k to 320k?
05-14-2018 12:35 PM - edited 05-14-2018 12:35 PM
Hi Andreyeff, it's not recommended to run past the scale limits because it's not tested. Likely you will run into scalability problems due to excessive numbers of threads and context switching, and other things we have not explored yet.