Support Questions

Find answers, ask questions, and share your expertise

Kudu tablet servers - who eats memory?

avatar
Explorer

I am trying to figure out why all my 3 tablet servers run out of memory, but it's hard to do.

Configuration: 3 tablet servers, each has memory_limit_hard_bytes set to 8GB. All Kudu operations are performed via Impala JDBC.

Symptoms: INSERT operations fail with errors like:

Service unavailable: Soft memory limit exceeded (at 101.27% of capacity 

Tablet web interface (http://<kudu-tablet-server>:8050/mem-trackers) shows that all the memory is consumed:

Total consumption 8.08G
Memory limit 8.00G
Percentage consumed 100.98%

Another table on this web page displays details about memory consumption:

Id Parent Limit Current Consumption Peak consumption
root none none 1.00G 1.01G
log_cache root 1.00G 835.7K 2.59M
... ... ... ... ...

 

However, when I try to sum up all entries in the "Current Consumption" column, I get much lower value: 2.6 GB. My question: who ate the remaining (8-2.6) = 5.4 GB?

 

How I sum up all the entries?

- parse the HTML code of the web page and extract the table with detailed memory consumption

- save this table a CSV file

- read it in spark-shell

- convert all memory consumption values to bytes (from GB, MB, KB)

- finally, calculate the sum for the "Current Consumption" and "Peak consumption" columns

Uh...

1 ACCEPTED SOLUTION

avatar
Expert Contributor
Indeed, there's going to be a significant amount of memory consumed just as overhead to support that number of tablets.

So you should either reduce the number of tablets per tserver, or increase the amount of RAM available to Kudu on those heavily-loaded machines.

View solution in original post

5 REPLIES 5

avatar
Contributor

The memory is being used somewhere other than in places where the usage is tracked by the memtrackers. The first possibility that comes to mind is that it is being used by the block manager. Try running

 

kudu fs check -fs_wal_dir=<wal dir> -fs_data_dirs=<data dirs>

 

and posting the output here. You may need to run the command as the kudu user (usually 'kudu').

 

Another helpful thing would be to gather heap samples. See https://kudu.apache.org/docs/troubleshooting.html#heap_sampling. If you attach the resulting SVG file it would be very helpful in locating the memory usage.

avatar
Explorer

Hi,

 

Now the tablet server is out of memory:

 

Total consumption8.11G
Memory limit8.00G
Percentage consumed101.44%

 

Here is the heap sample from the failing node: https://sendeyo.com/up/d/d26504e2ef

Here is the output of the "kudu fs check" command:

 

$ kudu fs check -fs_wal_dir=/data/kudu -fs_data_dirs=/data/kudu
I0319 10:23:40.103991 157327 fs_manager.cc:260] Metadata directory not provided
I0319 10:23:40.104055 157327 fs_manager.cc:263] Using existing metadata directory in first data directory
W0319 10:23:40.104472 157327 data_dirs.cc:615] IO error: Could not lock /data/kudu/data/block_manager_instance: Could not lock /data/kudu/data/block_manager_instance: lock /data/kudu/data/block_manager_instance: Resource temporarily unavailable (error 11)
W0319 10:23:40.104492 157327 data_dirs.cc:616] Proceeding without lock
I0319 10:23:40.109122 157327 fs_manager.cc:397] Time spent opening directory manager: real 0.005s       user 0.000s     sys 0.000s
I0319 10:23:40.109161 157327 env_posix.cc:1634] Not raising this process' open files per process limit of 65535; it is already as high as it can go
I0319 10:23:40.109180 157327 file_cache.cc:470] Constructed file cache lbm with capacity 26214
I0319 10:23:50.317314 157332 log_block_manager.cc:2367] Opened 584 log block containers in /data/kudu/data
I0319 10:24:00.467881 157332 log_block_manager.cc:2367] Opened 700 log block containers in /data/kudu/data
I0319 10:24:10.991781 157332 log_block_manager.cc:2367] Opened 766 log block containers in /data/kudu/data
I0319 10:24:20.999274 157332 log_block_manager.cc:2367] Opened 882 log block containers in /data/kudu/data
I0319 10:24:25.138837 157332 log_block_manager.cc:2437] Read-only block manager, skipping repair
I0319 10:24:25.146647 157327 fs_manager.cc:417] Time spent opening block manager: real 45.037s  user 0.000s     sys 0.000s
I0319 10:24:25.146875 157327 fs_manager.cc:428] Opened local filesystem: /data/kudu
uuid: "d8b982ee9d6348a29153e4540c6c425a"
format_stamp: "Formatted at 2017-12-20 17:21:05 on ossvert4"
Block manager report
--------------------
1 data directories: /data/kudu/data
Total live blocks: 1933969
Total live bytes: 61130157258
Total live bytes (after alignment): 66793267200
Total number of LBM containers: 906 (354 full)
Total missing blocks: 0
Total orphaned blocks: 514 (0 repaired)
Total orphaned block bytes: 609251411 (0 repaired)
Total full LBM containers with extra space: 37 (0 repaired)
Total full LBM container extra space in bytes: 2317418496 (0 repaired)
Total incomplete LBM containers: 0 (0 repaired)
Total LBM partial records: 0 (0 repaired)

The heap sample contains a surprising entry: "kudu HdrHistogram Init 4502.0 (54.4%)". Maybe I miss something, but it looks like an utility to collect some usage metrics. Why it consumed more than 50% of the memory allocated to the tablet server?

avatar
Expert Contributor
That is interesting. How many tablets are hosted by this tablet server?

avatar
Explorer

16812 tablets.

I see in the Known Issues, that the recommended value is 1000 tablets per tablet server. Maybe this is the cause of the problem.

avatar
Expert Contributor
Indeed, there's going to be a significant amount of memory consumed just as overhead to support that number of tablets.

So you should either reduce the number of tablets per tserver, or increase the amount of RAM available to Kudu on those heavily-loaded machines.