About huaj

huaj · ‎03-08-2019

yes if some table have many operation (insert / update) in 24 hours, the query performance go down significantly mrs is not col-storage i guess

huaj · ‎03-08-2019

waiting for kudu 1.9 in the coming cdh 6.2, hoping they would solve the problem flush_threshold_secs flag sucks when make it long, the query performance on the records in MRS (not yet flushing) sucks

huaj · ‎03-06-2019

I know it will be nested loop join But the performance is too low, it look like the operator 'like' has bug in this situation If i use 'locate' function instead of 'like', it's much more faster ---LIKE--- select count(*) from test_like t1 left join test_like t2 on t1.full_id like t2.full_id Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail -------------------------------------------------------------------------------------------------------------------------- 06:AGGREGATE 1 0.000ns 0.000ns 1 1 16.00 KB 10.00 MB FINALIZE 05:EXCHANGE 1 0.000ns 0.000ns 1 1 16.00 KB 16.00 KB UNPARTITIONED 03:AGGREGATE 1 16.000ms 16.000ms 1 1 135.00 KB 10.00 MB 02:NESTED LOOP JOIN 1 13s708ms 13s708ms 1.00M -1 147.00 KB 2.00 GB LEFT OUTER JOIN, BROADCAST |--04:EXCHANGE 1 0.000ns 0.000ns 1.00K -1 72.00 KB 56.99 KB BROADCAST | 01:SCAN KUDU 1 4.000ms 4.000ms 1.00K -1 59.00 KB 384.00 KB default.test_like t2 00:SCAN KUDU 1 36.001ms 36.001ms 1.00K -1 59.00 KB 384.00 KB default.test_like t1 ---LOCATE--- select count(*) from test_like t1 left join test_like t2 on locate(t1.full_id, t2.full_id) > 0 Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------------- 06:AGGREGATE 1 0.000ns 0.000ns 1 1 16.00 KB 10.00 MB FINALIZE 05:EXCHANGE 1 0.000ns 0.000ns 1 1 16.00 KB 16.00 KB UNPARTITIONED 03:AGGREGATE 1 68.001ms 68.001ms 1 1 135.00 KB 10.00 MB 02:NESTED LOOP JOIN 1 460.009ms 460.009ms 1.00M -1 147.00 KB 2.00 GB LEFT OUTER JOIN, BROADCAST |--04:EXCHANGE 1 0.000ns 0.000ns 1.00K -1 72.00 KB 56.99 KB BROADCAST | 01:SCAN KUDU 1 4.000ms 4.000ms 1.00K -1 59.00 KB 384.00 KB default.test_like t2 00:SCAN KUDU 1 0.000ns 0.000ns 1.00K -1 59.00 KB 384.00 KB default.test_like t1

huaj · ‎02-20-2019

cdh 6.1 impala 3.1 kudu 1.8 1 masternode + 5 datanode all nodes 8 core 64g, ssd create table test_like(id bigint, full_id string, primary key(id)) stored as kudu -- insert about 1k records, what ever they are ...... select count(*) from test_like t1 left join test_like t2 on t1.full_id like t2.full_id It takes 13 seconds to execute: [data-60:21000] gslq4dev_iquantity> select count(*) from test_like t1 left join test_like t2 on t1.full_id like t2.full_id ; Query: select count(*) from test_like t1 left join test_like t2 on t1.full_id like t2.full_id Query submitted at: 2019-02-21 13:44:55 (Coordinator: http://data-60:25000) Query progress can be monitored at: http://data-60:25000/query_plan?query_id=594efea080891f13:9e932ad900000000 +----------+ | count(*) | +----------+ | 1023 | +----------+ Fetched 1 row(s) in 13.54s [data-60:21000] gslq4dev_iquantity> summary; +---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------------------+ | 06:AGGREGATE | 1 | 0ns | 0ns | 1 | 1 | 16.00 KB | 10.00 MB | FINALIZE | | 05:EXCHANGE | 1 | 0ns | 0ns | 1 | 1 | 16.00 KB | 16.00 KB | UNPARTITIONED | | 03:AGGREGATE | 1 | 0ns | 0ns | 1 | 1 | 267.00 KB | 10.00 MB | | | 02:NESTED LOOP JOIN | 1 | 13.42s | 13.42s | 1.02K | -1 | 279.00 KB | 2.00 GB | LEFT OUTER JOIN, BROADCAST | | |--04:EXCHANGE | 1 | 0ns | 0ns | 1.02K | -1 | 136.00 KB | 56.99 KB | BROADCAST | | | 01:SCAN KUDU | 1 | 4.00ms | 4.00ms | 1.02K | -1 | 127.00 KB | 384.00 KB | gslq4dev_iquantity.test_like t2 | | 00:SCAN KUDU | 1 | 0ns | 0ns | 1.02K | -1 | 127.00 KB | 384.00 KB | gslq4dev_iquantity.test_like t1 | +---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------------------+ all the time spent on nested loop join This is a demo sql , in the real scene I want to get some rows and all their children using the full_id field full_id is in form of "1.2.3.4", and i have to use "like" operation to get the children( multi level) The record count in real table is far more than 1k, and it never finish why is it so slow? how to improve it?

huaj · ‎01-25-2019

It's a small dict table with less then 6k records The execution time of the sql doesnt change much when the limit amount varies, even no limit The execution time of the sql doesnt change much when execute many times I copied the table into a new one and execute that sql on the new table, the execute time and scan time is very low, about 5ms By the way, I changed some kudu config to avoid a bug, under your colleague's advise: kudu compaction did not run Is that the reason?

huaj · ‎01-25-2019

Sorry to reply late, and thanks a lot for your attention Here is the sql: SELECT user_id, team_id, name, pile_no_prefix, version FROM project_spot order by version LIMIT 7 Here is the trace in cm: Here is the picture in kudu web ui scan page: Here is the tace in kudu rpcz page: { "header": { "call_id": 23, "remote_method": { "service_name": "kudu.tserver.TabletServerService", "method_name": "Scan" }, "timeout_millis": 9999 }, "trace": "0125 16:44:14.197738 (+ 0us) service_pool.cc:163] Inserting onto call queue\n0125 16:44:14.197757 (+ 19us) service_pool.cc:222] Handling call\n0125 16:44:14.197852 (+ 95us) tablet_service.cc:1745] Creating iterator\n0125 16:44:14.198093 (+ 241us) tablet_service.cc:1789] Iterator init: OK\n0125 16:44:14.198096 (+ 3us) tablet_service.cc:1838] has_more: true\n0125 16:44:14.198097 (+ 1us) tablet_service.cc:1853] Continuing scan request\n0125 16:44:14.198101 (+ 4us) tablet_service.cc:1901] Found scanner a7c9ec89e70c4a65aed5da01e9c846cb\n0125 16:44:14.708182 (+510081us) tablet_service.cc:1960] Deadline expired - responding early\n0125 16:44:14.708296 (+ 114us) inbound_call.cc:157] Queueing success response\n", "duration_ms": 510, "metrics": [ { "key": "delta_iterators_relevant", "value": 1 }, { "key": "cfile_cache_hit", "value": 14 }, { "key": "cfile_cache_hit_bytes", "value": 289477 }, { "key": "threads_started", "value": 1 }, { "key": "thread_start_us", "value": 29 }, { "key": "compiler_manager_pool.queue_time_us", "value": 62 }, { "key": "compiler_manager_pool.run_cpu_time_us", "value": 110612 }, { "key": "compiler_manager_pool.run_wall_time_us", "value": 110623 } ] }

huaj · ‎01-22-2019

kudu1.7.0 in cdh 5.15 3 master nodes, 4c32g, ubuntu16.04 3 data nodes, 8c64g, 1.8T ssd, ubuntu16.04 It is a small table: project_spot, 5.93K records, 38 columns I found some slow sql on it（more than 400ms）, most time spend on MaterializeTupleTime So i check the scan page on kudu tablet server web ui, and found this: It is a very small scan, but spend 423ms, why? More information: I check the io performance on all data nodes using fio, no problem found: read : io=6324.4MB, bw=647551KB/s, iops=161887, runt= 10001msec There are 3 data nodes, only data02 has the problem. I reboot the tablet server on data02, not work

huaj · ‎12-29-2018

Sorry, i make a mistake flush_threshold_secs flag is for both master and tabletserver I did not set it on tabletserver, it's ok now after setting it

huaj · ‎12-28-2018

Problem not solved After a few hours, i check the rowset of that table, there are many small rowsets again! And run this command, i see 137 rowsets kudu fs list -fs_data_dirs=/data/1/kudu/data -fs_wal_dir=/data/1/kudu/wal -tablet_id=b110f90092b647e1bd5cd3de05b40aff Here is my config in cm, and i have restarted kudu services after change it Why was that? what can i du?

huaj · ‎12-28-2018

Just done it, and problem solved Thanks!

Online	Offline
Last Visited	‎01-03-2021 08:37 PM

Member Since	‎12-26-2018 11:22 PM
Last Visited	‎01-03-2021 08:37 PM
Posts	15

Cloudera Community

Re: kudu compaction did not run

Re: kudu compaction did not run

Re: very slow when use "like" operation in join cl...

very slow when use "like" operation in join clause

Re: kudu scan very slow

Re: kudu scan very slow

kudu scan very slow

Re: kudu compaction did not run

Re: kudu compaction did not run

Re: kudu compaction did not run