Member since
12-26-2018
15
Posts
0
Kudos Received
0
Solutions
03-08-2019
08:12 PM
yes if some table have many operation (insert / update) in 24 hours, the query performance go down significantly mrs is not col-storage i guess
... View more
03-08-2019
04:57 PM
waiting for kudu 1.9 in the coming cdh 6.2, hoping they would solve the problem flush_threshold_secs flag sucks when make it long, the query performance on the records in MRS (not yet flushing) sucks
... View more
03-06-2019
06:00 PM
I know it will be nested loop join But the performance is too low, it look like the operator 'like' has bug in this situation If i use 'locate' function instead of 'like', it's much more faster ---LIKE--- select count(*) from test_like t1
left join test_like t2 on t1.full_id like t2.full_id Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
--------------------------------------------------------------------------------------------------------------------------
06:AGGREGATE 1 0.000ns 0.000ns 1 1 16.00 KB 10.00 MB FINALIZE
05:EXCHANGE 1 0.000ns 0.000ns 1 1 16.00 KB 16.00 KB UNPARTITIONED
03:AGGREGATE 1 16.000ms 16.000ms 1 1 135.00 KB 10.00 MB
02:NESTED LOOP JOIN 1 13s708ms 13s708ms 1.00M -1 147.00 KB 2.00 GB LEFT OUTER JOIN, BROADCAST
|--04:EXCHANGE 1 0.000ns 0.000ns 1.00K -1 72.00 KB 56.99 KB BROADCAST
| 01:SCAN KUDU 1 4.000ms 4.000ms 1.00K -1 59.00 KB 384.00 KB default.test_like t2
00:SCAN KUDU 1 36.001ms 36.001ms 1.00K -1 59.00 KB 384.00 KB default.test_like t1 ---LOCATE--- select count(*) from test_like t1
left join test_like t2 on locate(t1.full_id, t2.full_id) > 0 Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------------
06:AGGREGATE 1 0.000ns 0.000ns 1 1 16.00 KB 10.00 MB FINALIZE
05:EXCHANGE 1 0.000ns 0.000ns 1 1 16.00 KB 16.00 KB UNPARTITIONED
03:AGGREGATE 1 68.001ms 68.001ms 1 1 135.00 KB 10.00 MB
02:NESTED LOOP JOIN 1 460.009ms 460.009ms 1.00M -1 147.00 KB 2.00 GB LEFT OUTER JOIN, BROADCAST
|--04:EXCHANGE 1 0.000ns 0.000ns 1.00K -1 72.00 KB 56.99 KB BROADCAST
| 01:SCAN KUDU 1 4.000ms 4.000ms 1.00K -1 59.00 KB 384.00 KB default.test_like t2
00:SCAN KUDU 1 0.000ns 0.000ns 1.00K -1 59.00 KB 384.00 KB default.test_like t1
... View more
02-20-2019
10:02 PM
cdh 6.1
impala 3.1
kudu 1.8
1 masternode + 5 datanode
all nodes 8 core 64g, ssd
create table test_like(id bigint, full_id string, primary key(id)) stored as kudu
-- insert about 1k records, what ever they are ......
select count(*) from test_like t1
left join test_like t2 on t1.full_id like t2.full_id
It takes 13 seconds to execute:
[data-60:21000] gslq4dev_iquantity> select count(*) from test_like t1
left join test_like t2 on t1.full_id like t2.full_id
;
Query: select count(*) from test_like t1
left join test_like t2 on t1.full_id like t2.full_id
Query submitted at: 2019-02-21 13:44:55 (Coordinator: http://data-60:25000)
Query progress can be monitored at: http://data-60:25000/query_plan?query_id=594efea080891f13:9e932ad900000000
+----------+
| count(*) |
+----------+
| 1023 |
+----------+
Fetched 1 row(s) in 13.54s
[data-60:21000] gslq4dev_iquantity> summary;
+---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------------------+
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------------------+
| 06:AGGREGATE | 1 | 0ns | 0ns | 1 | 1 | 16.00 KB | 10.00 MB | FINALIZE |
| 05:EXCHANGE | 1 | 0ns | 0ns | 1 | 1 | 16.00 KB | 16.00 KB | UNPARTITIONED |
| 03:AGGREGATE | 1 | 0ns | 0ns | 1 | 1 | 267.00 KB | 10.00 MB | |
| 02:NESTED LOOP JOIN | 1 | 13.42s | 13.42s | 1.02K | -1 | 279.00 KB | 2.00 GB | LEFT OUTER JOIN, BROADCAST |
| |--04:EXCHANGE | 1 | 0ns | 0ns | 1.02K | -1 | 136.00 KB | 56.99 KB | BROADCAST |
| | 01:SCAN KUDU | 1 | 4.00ms | 4.00ms | 1.02K | -1 | 127.00 KB | 384.00 KB | gslq4dev_iquantity.test_like t2 |
| 00:SCAN KUDU | 1 | 0ns | 0ns | 1.02K | -1 | 127.00 KB | 384.00 KB | gslq4dev_iquantity.test_like t1 |
+---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------------------+
all the time spent on nested loop join
This is a demo sql , in the real scene I want to get some rows and all their children using the full_id field
full_id is in form of "1.2.3.4", and i have to use "like" operation to get the children( multi level)
The record count in real table is far more than 1k, and it never finish
why is it so slow? how to improve it?
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Kudu
01-25-2019
06:45 PM
It's a small dict table with less then 6k records The execution time of the sql doesnt change much when the limit amount varies, even no limit The execution time of the sql doesnt change much when execute many times I copied the table into a new one and execute that sql on the new table, the execute time and scan time is very low, about 5ms By the way, I changed some kudu config to avoid a bug, under your colleague's advise: kudu compaction did not run Is that the reason?
... View more
01-25-2019
12:50 AM
Sorry to reply late, and thanks a lot for your attention Here is the sql: SELECT user_id, team_id, name, pile_no_prefix, version FROM project_spot order by version LIMIT 7 Here is the trace in cm: Here is the picture in kudu web ui scan page: Here is the tace in kudu rpcz page: {
"header": {
"call_id": 23,
"remote_method": {
"service_name": "kudu.tserver.TabletServerService",
"method_name": "Scan"
},
"timeout_millis": 9999
},
"trace": "0125 16:44:14.197738 (+ 0us) service_pool.cc:163] Inserting onto call queue\n0125 16:44:14.197757 (+ 19us) service_pool.cc:222] Handling call\n0125 16:44:14.197852 (+ 95us) tablet_service.cc:1745] Creating iterator\n0125 16:44:14.198093 (+ 241us) tablet_service.cc:1789] Iterator init: OK\n0125 16:44:14.198096 (+ 3us) tablet_service.cc:1838] has_more: true\n0125 16:44:14.198097 (+ 1us) tablet_service.cc:1853] Continuing scan request\n0125 16:44:14.198101 (+ 4us) tablet_service.cc:1901] Found scanner a7c9ec89e70c4a65aed5da01e9c846cb\n0125 16:44:14.708182 (+510081us) tablet_service.cc:1960] Deadline expired - responding early\n0125 16:44:14.708296 (+ 114us) inbound_call.cc:157] Queueing success response\n",
"duration_ms": 510,
"metrics": [
{
"key": "delta_iterators_relevant",
"value": 1
},
{
"key": "cfile_cache_hit",
"value": 14
},
{
"key": "cfile_cache_hit_bytes",
"value": 289477
},
{
"key": "threads_started",
"value": 1
},
{
"key": "thread_start_us",
"value": 29
},
{
"key": "compiler_manager_pool.queue_time_us",
"value": 62
},
{
"key": "compiler_manager_pool.run_cpu_time_us",
"value": 110612
},
{
"key": "compiler_manager_pool.run_wall_time_us",
"value": 110623
}
]
}
... View more
01-22-2019
08:36 PM
kudu1.7.0 in cdh 5.15 3 master nodes, 4c32g, ubuntu16.04 3 data nodes, 8c64g, 1.8T ssd, ubuntu16.04 It is a small table: project_spot, 5.93K records, 38 columns I found some slow sql on it(more than 400ms), most time spend on MaterializeTupleTime So i check the scan page on kudu tablet server web ui, and found this: It is a very small scan, but spend 423ms, why? More information: I check the io performance on all data nodes using fio, no problem found: read : io=6324.4MB, bw=647551KB/s, iops=161887, runt= 10001msec There are 3 data nodes, only data02 has the problem. I reboot the tablet server on data02, not work
... View more
Labels:
- Labels:
-
Apache Kudu
12-29-2018
11:48 PM
Sorry, i make a mistake flush_threshold_secs flag is for both master and tabletserver I did not set it on tabletserver, i t's ok now after setting it
... View more
12-28-2018
08:17 PM
Problem not solved After a few hours, i check the rowset of that table, there are many small rowsets again! And run this command, i see 137 rowsets kudu fs list -fs_data_dirs=/data/1/kudu/data -fs_wal_dir=/data/1/kudu/wal -tablet_id=b110f90092b647e1bd5cd3de05b40aff Here is my config in cm, and i have restarted kudu services after change it Why was that? what can i du?
... View more
12-27-2018
06:34 PM
Thanks for reply, but that is a sad answer 😞 I reed the doc, indeed i trigger the 'Trickling inserts' Small cruds are happenning to the table all the time, with low pressure According to the doc's suggestion, i should do the below actions: 1. set --flush_threshold_secs long enough, like 1 day 2. create a new table, and copy all data to new table 3. drop the old table 4. rename the new table to the old table name Is it right? And, I see the cdh6.1 just release, when will 6.2 release?
... View more
- Tags:
- kudu
12-26-2018
11:52 PM
I found this for that tablet in the tablet server metrics: {
"name": "compact_rs_duration",
"total_count": 0,
"min": 0,
"mean": 0,
"percentile_75": 0,
"percentile_95": 0,
"percentile_99": 0,
"percentile_99_9": 0,
"percentile_99_99": 0,
"max": 0,
"total_sum": 0
}, compaction has never run!
... View more
12-26-2018
11:48 PM
kudu1.7.0 in cdh 5.15 3 master nodes, 4c32g, ubuntu16.04 3 data nodes, 8c64g, 1.8T ssd, ubuntu16.04 Here is a table : project_construction_record, 62 columns, 170k records, no partition The table has many crud operations every day I run a simple sql on it (using impala): SELECT * FROM project_construction_record ORDER BY id LIMIT 1 it takes 7 seconds! By checking the profile, I found this: KUDU_SCAN_NODE (id=0) (6.06s) BytesRead: 0 byte CollectionItemsRead: 0 InactiveTotalTime: 0 ns KuduRemoteScanTokens: 0 NumScannerThreadsStarted: 1 PeakMemoryUsage: 3.4 MB RowsRead: 177,007 RowsReturned: 177,007 RowsReturnedRate: 29188/s ScanRangesComplete: 1 ScannerThreadsInvoluntaryContextSwitches: 0 ScannerThreadsTotalWallClockTime: 6.09s MaterializeTupleTime : 6.06s ScannerThreadsSysTime: 48ms ScannerThreadsUserTime: 172ms So i check the scan of this sql, and found this: column cells read bytes read blocks read id 176.92k 1.91M 19.96k org_id 176.92k 1.91M 19.96k work_date 176.92k 2.03M 19.96k description 176.92k 1.21M 19.96k user_name 176.92k 775.9K 19.96k spot_name 176.92k 825.8K 19.96k spot_start_pile 176.92k 778.7K 19.96k spot_end_pile 176.92k 780.4K 19.96k ...... ...... ...... ...... There are so many blocks read. Then I run the kudu fs list command, and I got a 70M report data, here is the bottom: 0b6ac30b449043a68905e02b797144fc | 25024 | 40310988 | column
0b6ac30b449043a68905e02b797144fc | 25024 | 40310989 | column
0b6ac30b449043a68905e02b797144fc | 25024 | 40310990 | column
0b6ac30b449043a68905e02b797144fc | 25024 | 40310991 | column
0b6ac30b449043a68905e02b797144fc | 25024 | 40310992 | column
0b6ac30b449043a68905e02b797144fc | 25024 | 40310993 | column
0b6ac30b449043a68905e02b797144fc | 25024 | 40310996 | undo
0b6ac30b449043a68905e02b797144fc | 25024 | 40310994 | bloom
0b6ac30b449043a68905e02b797144fc | 25024 | 40310995 | adhoc-index there are 25024 rowsets, and more than 1m blocks in the tablet I left the maintenance and the compact flags by default, only change the tablet_history_max_age_sec to one day: --maintenance_manager_history_size=8
--maintenance_manager_num_threads=1
--maintenance_manager_polling_interval_ms=250
--budgeted_compaction_target_rowset_size=33554432
--compaction_approximation_ratio=1.0499999523162842
--compaction_minimum_improvement=0.0099999997764825821
--deltafile_default_block_size=32768
--deltafile_default_compression_codec=lz4
--default_composite_key_index_block_size_bytes=4096
--tablet_delta_store_major_compact_min_ratio=0.10000000149011612
--tablet_delta_store_minor_compact_max=1000
--mrs_use_codegen= true
--compaction_policy_dump_svgs_pattern=
--enable_undo_delta_block_gc= true
--fault_crash_before_flush_tablet_meta_after_compaction=0
--fault_crash_before_flush_tablet_meta_after_flush_mrs=0
--max_cell_size_bytes=65536
--max_encoded_key_size_bytes=16384
--tablet_bloom_block_size=4096
--tablet_bloom_target_fp_rate=9.9999997473787516e-05
--tablet_compaction_budget_mb=128
--tablet_history_max_age_sec=86400 It is a production enviroment, and many other tables have same issue, the performance is getting slower and slower. So my question is: why the compaction does not run? is it a bug? and can i do compact manually?
... View more
Labels:
- Labels:
-
Apache Kudu