Posts: 11
Registered: ‎02-27-2019
Accepted Solution

Kudu tablet servers - who eats memory?

[ Edited ]

I am trying to figure out why all my 3 tablet servers run out of memory, but it's hard to do.

Configuration: 3 tablet servers, each has memory_limit_hard_bytes set to 8GB. All Kudu operations are performed via Impala JDBC.

Symptoms: INSERT operations fail with errors like:

Service unavailable: Soft memory limit exceeded (at 101.27% of capacity 

Tablet web interface (http://<kudu-tablet-server>:8050/mem-trackers) shows that all the memory is consumed:

Total consumption 8.08G
Memory limit 8.00G
Percentage consumed 100.98%

Another table on this web page displays details about memory consumption:

Id Parent Limit Current Consumption Peak consumption
root none none 1.00G 1.01G
log_cache root 1.00G 835.7K 2.59M
... ... ... ... ...


However, when I try to sum up all entries in the "Current Consumption" column, I get much lower value: 2.6 GB. My question: who ate the remaining (8-2.6) = 5.4 GB?


How I sum up all the entries?

- parse the HTML code of the web page and extract the table with detailed memory consumption

- save this table a CSV file

- read it in spark-shell

- convert all memory consumption values to bytes (from GB, MB, KB)

- finally, calculate the sum for the "Current Consumption" and "Peak consumption" columns


Cloudera Employee
Posts: 17
Registered: ‎04-25-2017

Re: Kudu tablet servers - who eats memory?

The memory is being used somewhere other than in places where the usage is tracked by the memtrackers. The first possibility that comes to mind is that it is being used by the block manager. Try running


kudu fs check -fs_wal_dir=<wal dir> -fs_data_dirs=<data dirs>


and posting the output here. You may need to run the command as the kudu user (usually 'kudu').


Another helpful thing would be to gather heap samples. See If you attach the resulting SVG file it would be very helpful in locating the memory usage.

Posts: 11
Registered: ‎02-27-2019

Re: Kudu tablet servers - who eats memory?

[ Edited ]



Now the tablet server is out of memory:


Total consumption8.11G
Memory limit8.00G
Percentage consumed101.44%


Here is the heap sample from the failing node:

Here is the output of the "kudu fs check" command:


$ kudu fs check -fs_wal_dir=/data/kudu -fs_data_dirs=/data/kudu
I0319 10:23:40.103991 157327] Metadata directory not provided
I0319 10:23:40.104055 157327] Using existing metadata directory in first data directory
W0319 10:23:40.104472 157327] IO error: Could not lock /data/kudu/data/block_manager_instance: Could not lock /data/kudu/data/block_manager_instance: lock /data/kudu/data/block_manager_instance: Resource temporarily unavailable (error 11)
W0319 10:23:40.104492 157327] Proceeding without lock
I0319 10:23:40.109122 157327] Time spent opening directory manager: real 0.005s       user 0.000s     sys 0.000s
I0319 10:23:40.109161 157327] Not raising this process' open files per process limit of 65535; it is already as high as it can go
I0319 10:23:40.109180 157327] Constructed file cache lbm with capacity 26214
I0319 10:23:50.317314 157332] Opened 584 log block containers in /data/kudu/data
I0319 10:24:00.467881 157332] Opened 700 log block containers in /data/kudu/data
I0319 10:24:10.991781 157332] Opened 766 log block containers in /data/kudu/data
I0319 10:24:20.999274 157332] Opened 882 log block containers in /data/kudu/data
I0319 10:24:25.138837 157332] Read-only block manager, skipping repair
I0319 10:24:25.146647 157327] Time spent opening block manager: real 45.037s  user 0.000s     sys 0.000s
I0319 10:24:25.146875 157327] Opened local filesystem: /data/kudu
uuid: "d8b982ee9d6348a29153e4540c6c425a"
format_stamp: "Formatted at 2017-12-20 17:21:05 on ossvert4"
Block manager report
1 data directories: /data/kudu/data
Total live blocks: 1933969
Total live bytes: 61130157258
Total live bytes (after alignment): 66793267200
Total number of LBM containers: 906 (354 full)
Total missing blocks: 0
Total orphaned blocks: 514 (0 repaired)
Total orphaned block bytes: 609251411 (0 repaired)
Total full LBM containers with extra space: 37 (0 repaired)
Total full LBM container extra space in bytes: 2317418496 (0 repaired)
Total incomplete LBM containers: 0 (0 repaired)
Total LBM partial records: 0 (0 repaired)

The heap sample contains a surprising entry: "kudu HdrHistogram Init 4502.0 (54.4%)". Maybe I miss something, but it looks like an utility to collect some usage metrics. Why it consumed more than 50% of the memory allocated to the tablet server?

Cloudera Employee
Posts: 37
Registered: ‎11-12-2013

Re: Kudu tablet servers - who eats memory?

That is interesting. How many tablets are hosted by this tablet server?
Posts: 11
Registered: ‎02-27-2019

Re: Kudu tablet servers - who eats memory?

[ Edited ]

16812 tablets.

I see in the Known Issues, that the recommended value is 1000 tablets per tablet server. Maybe this is the cause of the problem.

Cloudera Employee
Posts: 37
Registered: ‎11-12-2013

Re: Kudu tablet servers - who eats memory?

Indeed, there's going to be a significant amount of memory consumed just as overhead to support that number of tablets.

So you should either reduce the number of tablets per tserver, or increase the amount of RAM available to Kudu on those heavily-loaded machines.