About Darcol

Darcol · ‎11-25-2023

Hello, I'm running a Kudu 1.17 cluster with 1 master and 1 tserver configured with the following flags: --block_cache_capacity_mb=256 --use_hybrid_clock=false --fs_data_dirs=/mnt/data0/data --fs_metadata_dir=/mnt/data0/metadata --fs_wal_dir=/mnt/data0/tserver --log_container_metadata_runtime_compact=true --rpc_dump_all_traces=true --rpc_service_queue_length=200 --rpc_authentication=disabled --rpc_encryption=disabled --webserver_doc_root=/opt/kudu/www --tablet_history_max_age_sec=300 --tserver_master_addrs=kudu-masters-0.kudu-masters-db-services-headless:7051 --heap_profile_path=/tmp/kudu.1 --unlock_experimental_flags=true --unlock_unsafe_flags=true --memory_limit_hard_bytes=2073741824 --stderrthreshold=0 I'm currently running 878 tablets on that single tserver. For my workload, I'm executing around 10 concurrent scans via python kudu client on different tables, all having a partition by hash on the primary key. On the background, some write operations are performed from time to time, inserting just a little number of rows every time. Frequently I incur into slow scans on one of the table, having a little number of rows, around 10 per partition. By checking logs and RPC traces, it seems that a lot of time is spent in the operation tracked with the metric lbm_read_time_us: { "remote_ip": "10.42.1.32:44266", "state": "OPEN", "remote_user_credentials": "{username='root'}", "calls_in_flight": [ { "header": { "call_id": 2, "remote_method": { "service_name": "kudu.tserver.TabletServerService", "method_name": "Scan" }, "timeout_millis": 29999, "required_feature_flags": [ 1 ] }, "micros_elapsed": 14871632 } ], "socket_stats": { "rtt": 1379, "rttvar": 1442, "snd_cwnd": 10, "total_retrans": 0, "pacing_rate": 20273724, "max_pacing_rate": 18446744073709551615, "bytes_acked": 760, "bytes_received": 3507, "segs_out": 12, "segs_in": 16, "send_queue_bytes": 0, "receive_queue_bytes": 0, "send_bytes_per_sec": 10137781 }, "transport_details": { "tcp": { "max_segment_size": 1398 } } } I20231125 14:56:01.833339 109 rpcz_store.cc:269] Call kudu.tserver.TabletServerService.Scan from 10.42.1.32:44266 (request call id 2) took 16772ms. Trace: I20231125 14:56:01.833528 109 rpcz_store.cc:270] 1125 14:55:45.061275 (+ 0us) service_pool.cc:169] Inserting onto call queue 1125 14:55:45.061332 (+ 57us) service_pool.cc:228] Handling call 1125 14:55:45.061432 (+ 100us) tablet_service.cc:2763] Created scanner 547f841e3efe42ff9c49896ceb28945a for tablet 4aa9943027b9479cb38ea1bfa543aa75 1125 14:55:45.061668 (+ 236us) tablet_service.cc:2903] Creating iterator 1125 14:55:45.061687 (+ 19us) tablet_service.cc:2928] Iterator created 1125 14:55:45.084323 (+ 22636us) tablet_service.cc:2950] Iterator init: OK 1125 14:55:45.084333 (+ 10us) tablet_service.cc:2993] has_more: true 1125 14:55:45.084346 (+ 13us) tablet_service.cc:3010] Continuing scan request 1125 14:55:45.084394 (+ 48us) tablet_service.cc:3073] Found scanner 547f841e3efe42ff9c49896ceb28945a for tablet 4aa9943027b9479cb38ea1bfa543aa75 1125 14:56:01.831663 (+16747269us) tablet_service.cc:3147] Deadline expired - responding early 1125 14:56:01.833323 (+ 1660us) inbound_call.cc:173] Queueing success response Metrics: {"cfile_cache_hit":2,"cfile_cache_hit_bytes":78,"cfile_cache_miss":49,"cfile_cache_miss_bytes":1393,"cfile_init":3827,"delta_iterators_relevant":10572,"lbm_read_time_us":16386006,"lbm_reads_1-10_ms":2121,"lbm_reads_10-100_ms":322,"lbm_reads_gt_100_ms":3,"lbm_reads_lt_1ms":12911,"rowset_iterators":19,"scanner_bytes_read":665} This is not consistent, by repeating the workload I see querying going from ms to s without a clear reason. It seems that Kudu is spending a lot of time trying to read data via the log block manager. But I don't know how to debug that, nor I know where to look to understand what is the problem. Do you have any suggestions on why this is happening? Thanks, Dario

Darcol · ‎11-02-2023

Thanks for the reply. I saw this issue that seems to exactly describe my concerns: https://issues.apache.org/jira/plugins/servlet/mobile#issue/KUDU-3318 From what I understood, thanks to that fix the compaction is performed also during runtime. Maybe upgrading Kudu to 1.17 solves the problem? Do you suggest something else? Thanks, Dario

Darcol · ‎11-02-2023

Hello, we are running the following Kudu cluster configuration: 1 master 3 tservers Version: 1.16 The total number of live tablet replicas is 552, as inspected via tserver UI. Tables are partitioned with both range and hash partitioning, following Kudu guidelines. Most of tablets are bound to historical tables, i.e. data is continuously inserted without being further modified. On the other hand, we have some "lookup" tables keeping only the last value of other tables, i.e. a lot of updates is continuously performed on these tablets. What we noticed from this scenario is that most of the disk space is occupied by Kudu metadata which seems to be unused during normal insert/scan operations. And this metadata seems to have an high impact, occupying around 50% of our total disk space (which is 30GB per instance, right now). After having rebooted one of the instances, we have noticed that Kudu is executing a sort of cleanup/data compression that allows to free around 40% of disk space, in particular after some operations performed by the "Log Block Manager" (opened 15k log block containers). Do you have any clues on what's happening here? Shouldn't be there some cleanup/compression routine that Kudu executes during runtime? Also, it seems that, in order to perform such operations on disk data, Kudu has to first load everything into memory, thus causing a very high RAM spike during startup, which is not sustainable, at least in our case. This is causing many troubles on our Kubernetes setup since during startup Kudu fills up to 30GB of memory when it need to perform this compression via log block manager, while during normal usage it requires only 1.5GB. If such memory is not available, we are doomed to a Kudu pod restarting issue, due to the fact that it requires to perform such operations before being able to resume processing. At first, I thought that this high disk usage could be due to MVCC snapshots continuously performed by "read_latest" scans on the tablets where we have a lots of updates, but all GBs of metadata freed during reboot make me suspicious about something else going on under the hood... Please let me know if you need some log and how to provide them. We already have rebooted 2 of the 3 instances, freeing up disk space, so probably there is still some log available for the third instance. The flag we set are the following ones: --fs_wal_dir=/mnt/data0/tserver --fs_metadata_dir=/mnt/data0/metadata --fs_data_dirs=/mnt/data0/data --webserver_doc_root=/opt/kudu/www --rpc_service_queue_length=200 --maintenance_manager_num_threads=1 --webserver_enabled=true --rpc_encryption disabled --rpc_authentication disabled --memory_limit_hard_bytes 1073741824 --block_cache_capacity_mb 256 Thank you for your help. Dario

Online	Offline
Last Visited	‎01-25-2024 12:37 AM

Member Since	‎11-02-2023 08:07 AM
Last Visited	‎01-25-2024 12:37 AM
Posts	3
Kudos received	1

Cloudera Community

Kudu slow scan due to log block manager

Re: Kudu is filling up a lot of disk space which i...

Kudu is filling up a lot of disk space which is fr...