About ChelouteMS

ChelouteMS · ‎07-11-2018

Hi, well... my fault... The problem wasn't Kudu but the Azure ELB in front of all the platform: traffic was blocked due to DDOS policy... Everything else became unstable when traffic started to be blocked.

ChelouteMS · ‎06-22-2018

I recently tried to make a real time PoC using Kafka, Spark2 and Kudu and I'm facing severe performance impact with Kudu. To sum up the case, some devices send short messages to a Kafka topic, collected by a Spark2 Streaming job (5 seconds batch) and persisted to Kudu throw upsert operations. Volume is up to 600.000 messages / 5 seconds, 40bytes / message. My Kudu counts with 6 Tablet Servers (Hard Memory = 20GB / Kudu Tablet Server), each TS has 3 SSD disks and the table is defined with hash partition of 100 buckets, and 3 Masters. Kudu is deployed with a Cloudera CDH ExpressEdition 5.14 in Azure to make the PoC, and all the nodes are in the same vnet. When I start the Spark2 job (execution time remains between 1 and 3 seconds), everything seems to go fine the first 24 hours. Next it appears a kind of bottleneck upserting rows into Kudu, and the delay varies between 5 and more than 600 seconds. Analizing the Kudu TS logs, I can see warning and error messages I can't interpret. For instance: W0622 08:21:50.793898 2051 negotiation.cc:313] Failed RPC negotiation. Trace: 0622 08:21:50.793636 (+ 0us) reactor.cc:579] Submitting negotiation task for server connection from 10.164.3.101:45662 0622 08:21:50.793817 (+ 181us) server_negotiation.cc:176] Beginning negotiation 0622 08:21:50.793819 (+ 2us) server_negotiation.cc:365] Waiting for connection header 0622 08:21:50.793862 (+ 43us) negotiation.cc:304] Negotiation complete: Network error: Server connection negotiation failed: server connection from xxx.xxx.xxx.xxx:45662: BlockingRecv error: Recv() got EOF from remote (error 108) Metrics: {"server-negotiator.queue_time_us":151,"thread_start_us":86,"threads_started":1} This warning appears several times on several TS, but I didn't notice any trouble with our network. E0622 11:24:43.982810 28825 hybrid_clock.cc:395] Unable to read clock for last 0.774s: Service unavailable: Error reading clock. Clock considered unsynchronized This error has begun after 5 days using Kudu day and night. W0622 11:24:47.979626 66209 connection.cc:657] server connection from xxx.xxx.xxx.xxx:50238 send error: Network error: failed to write to TLS socket: Connection reset by peer W0622 11:24:47.979672 66209 connection.cc:422] Connection torn down before Call kudu.tserver.TabletServerService.Write from xxx.xxx.xxx.xxx:50238 (ReqId={client: afb1a862dcca4b3494e27a011fe31d63, seq_no=4637933, attempt_no=1}) could send its response W0622 11:24:47.979729 66209 connection.cc:188] Error closing socket: Network error: TlsSocket::Close: Success W0622 11:24:47.981634 66209 connection.cc:511] server connection from xxx.xxx.xxx.xxx:33178 recv error: Network error: failed to read from TLS socket: Connection reset by peer (error 104) W0622 11:24:47.981720 66209 connection.cc:188] Error closing socket: Network error: TlsSocket::Close: Broken pipe W0622 11:24:48.057577 66210 connection.cc:657] server connection from xxx.xxx.xxx.xxx:41744 send error: Network error: failed to write to TLS socket: Broken pipe W0622 11:24:48.057612 66210 connection.cc:422] Connection torn down before Call kudu.tserver.TabletServerService.Write from xxx.xxx.xxx.xxx:41744 (ReqId={client: 47ee414afec74fcb86f86a19b62d8f7d, seq_no=4631450, attempt_no=1}) could send its response W0622 11:24:48.057677 66210 connection.cc:188] Error closing socket: Network error: TlsSocket::Close: Success W0622 11:24:48.057819 66210 connection.cc:422] Connection torn down before Call kudu.tserver.TabletServerService.Write from xxx.xxx.xxx.xxx:41744 (ReqId={client: 47ee414afec74fcb86f86a19b62d8f7d, seq_no=4631470, attempt_no=1}) could send its response In this case, I notice this warning on several TS, but I can't identify any trouble with my network (an independant vnet in Azure) W0621 14:34:19.201867 125277 env_posix.cc:862] Time spent sync call for /data/kudu/0/data/data/008809b1a384414b93b9d95b6275217a.metadata: real 2.234s user 0.000s sys 0.000s W0621 14:34:32.088975 125277 env_posix.cc:862] Time spent sync call for /data/kudu/0/data/data/56153c373b084500b3b1fd9310c30c53.data: real 1.786s user 0.000s sys 0.000s W0621 14:37:23.982235 125277 env_posix.cc:862] Time spent sync call for /data/kudu/0/data/data/df02bb85c15c4652ae653d68e7b8d0ae.metadata: real 7.751s user 0.000s sys 0.000s W0621 14:37:45.769942 125275 kernel_stack_watchdog.cc:191] Thread 118579 stuck at /data/jenkins/workspace/generic-package-centos64-7-0-impala/topdir/BUILD/kudu-1.6.0-cdh5.14.0/src/kudu/consensus/log.cc:664 for 557ms: Kernel stack: [<ffffffffc03eecc5>] do_get_write_access+0x285/0x4c0 [jbd2] [<ffffffffc03eef27>] jbd2_journal_get_write_access+0x27/0x40 [jbd2] [<ffffffffc0442d4b>] __ext4_journal_get_write_access+0x3b/0x80 [ext4] [<ffffffffc04144a0>] ext4_reserve_inode_write+0x70/0xa0 [ext4] [<ffffffffc0414523>] ext4_mark_inode_dirty+0x53/0x210 [ext4] [<ffffffffc0417af0>] ext4_dirty_inode+0x40/0x60 [ext4] [<ffffffff8122ffba>] __mark_inode_dirty+0x16a/0x270 [<ffffffff8121ee01>] update_time+0x81/0xd0 [<ffffffff8121eef0>] file_update_time+0xa0/0xf0 [<ffffffff811865a8>] __generic_file_aio_write+0x198/0x400 [<ffffffff81186869>] generic_file_aio_write+0x59/0xa0 [<ffffffffc040aeeb>] ext4_file_write+0xdb/0x470 [ext4] [<ffffffff812020e9>] do_sync_readv_writev+0x79/0xd0 [<ffffffff81203cce>] do_readv_writev+0xce/0x260 [<ffffffff81203ef5>] vfs_writev+0x35/0x60 [<ffffffff812042f2>] SyS_pwritev+0xc2/0xf0 [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff User stack: (thread did not respond: maybe it is blocking signals) In this case, I just can't understand what the message means. I'm sure something is wrong with my global configuration (Kudu or Network), but I can't understand what. Any help would be great! Thanks

Online	Offline
Last Visited	‎07-23-2018 09:22 AM

Member Since	‎06-22-2018 07:19 AM
Last Visited	‎07-23-2018 09:22 AM
Posts	3

Cloudera Community

Re: Low Performance with Kudu and potential networ...

Re: Low Performance with Kudu and potential networ...

Low Performance with Kudu and potential network er...