Hello,
I'm running a Kudu 1.17 cluster with 1 master and 1 tserver configured with the following flags:
--block_cache_capacity_mb=256
--use_hybrid_clock=false
--fs_data_dirs=/mnt/data0/data
--fs_metadata_dir=/mnt/data0/metadata
--fs_wal_dir=/mnt/data0/tserver
--log_container_metadata_runtime_compact=true
--rpc_dump_all_traces=true
--rpc_service_queue_length=200
--rpc_authentication=disabled
--rpc_encryption=disabled
--webserver_doc_root=/opt/kudu/www
--tablet_history_max_age_sec=300
--tserver_master_addrs=kudu-masters-0.kudu-masters-db-services-headless:7051
--heap_profile_path=/tmp/kudu.1
--unlock_experimental_flags=true
--unlock_unsafe_flags=true
--memory_limit_hard_bytes=2073741824
--stderrthreshold=0
I'm currently running 878 tablets on that single tserver.
For my workload, I'm executing around 10 concurrent scans via python kudu client on different tables, all having a partition by hash on the primary key. On the background, some write operations are performed from time to time, inserting just a little number of rows every time.
Frequently I incur into slow scans on one of the table, having a little number of rows, around 10 per partition.
By checking logs and RPC traces, it seems that a lot of time is spent in the operation tracked with the metric lbm_read_time_us:
{
"remote_ip": "10.42.1.32:44266",
"state": "OPEN",
"remote_user_credentials": "{username='root'}",
"calls_in_flight": [
{
"header": {
"call_id": 2,
"remote_method": {
"service_name": "kudu.tserver.TabletServerService",
"method_name": "Scan"
},
"timeout_millis": 29999,
"required_feature_flags": [
1
]
},
"micros_elapsed": 14871632
}
],
"socket_stats": {
"rtt": 1379,
"rttvar": 1442,
"snd_cwnd": 10,
"total_retrans": 0,
"pacing_rate": 20273724,
"max_pacing_rate": 18446744073709551615,
"bytes_acked": 760,
"bytes_received": 3507,
"segs_out": 12,
"segs_in": 16,
"send_queue_bytes": 0,
"receive_queue_bytes": 0,
"send_bytes_per_sec": 10137781
},
"transport_details": {
"tcp": {
"max_segment_size": 1398
}
}
}
I20231125 14:56:01.833339 109 rpcz_store.cc:269] Call kudu.tserver.TabletServerService.Scan from 10.42.1.32:44266 (request call id 2) took 16772ms. Trace:
I20231125 14:56:01.833528 109 rpcz_store.cc:270] 1125 14:55:45.061275 (+ 0us) service_pool.cc:169] Inserting onto call queue
1125 14:55:45.061332 (+ 57us) service_pool.cc:228] Handling call
1125 14:55:45.061432 (+ 100us) tablet_service.cc:2763] Created scanner 547f841e3efe42ff9c49896ceb28945a for tablet 4aa9943027b9479cb38ea1bfa543aa75
1125 14:55:45.061668 (+ 236us) tablet_service.cc:2903] Creating iterator
1125 14:55:45.061687 (+ 19us) tablet_service.cc:2928] Iterator created
1125 14:55:45.084323 (+ 22636us) tablet_service.cc:2950] Iterator init: OK
1125 14:55:45.084333 (+ 10us) tablet_service.cc:2993] has_more: true
1125 14:55:45.084346 (+ 13us) tablet_service.cc:3010] Continuing scan request
1125 14:55:45.084394 (+ 48us) tablet_service.cc:3073] Found scanner 547f841e3efe42ff9c49896ceb28945a for tablet 4aa9943027b9479cb38ea1bfa543aa75
1125 14:56:01.831663 (+16747269us) tablet_service.cc:3147] Deadline expired - responding early
1125 14:56:01.833323 (+ 1660us) inbound_call.cc:173] Queueing success response
Metrics: {"cfile_cache_hit":2,"cfile_cache_hit_bytes":78,"cfile_cache_miss":49,"cfile_cache_miss_bytes":1393,"cfile_init":3827,"delta_iterators_relevant":10572,"lbm_read_time_us":16386006,"lbm_reads_1-10_ms":2121,"lbm_reads_10-100_ms":322,"lbm_reads_gt_100_ms":3,"lbm_reads_lt_1ms":12911,"rowset_iterators":19,"scanner_bytes_read":665}
This is not consistent, by repeating the workload I see querying going from ms to s without a clear reason. It seems that Kudu is spending a lot of time trying to read data via the log block manager. But I don't know how to debug that, nor I know where to look to understand what is the problem.
Do you have any suggestions on why this is happening?
Thanks,
Dario