Member since 
    
	
		
		
		06-22-2018
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                3
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 7212 | 07-11-2018 09:32 AM | 
			
    
	
		
		
		07-11-2018
	
		
		09:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Hi, well... my fault... The problem wasn't Kudu but the Azure ELB in front of all the platform: traffic was blocked due to DDOS policy... Everything else became unstable when traffic started to be blocked.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-22-2018
	
		
		07:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I recently tried to make a real time PoC using Kafka, Spark2 and Kudu and I'm facing severe performance impact with Kudu.  To sum up the case, some devices send short messages to a Kafka topic, collected by a Spark2 Streaming job (5 seconds batch) and persisted to Kudu throw upsert operations.  Volume is up to 600.000 messages / 5 seconds, 40bytes / message. My Kudu counts with 6 Tablet Servers (Hard Memory = 20GB / Kudu Tablet Server), each TS has 3 SSD disks and the table is defined with hash partition of 100 buckets, and 3 Masters.  Kudu is deployed with a Cloudera CDH ExpressEdition 5.14 in Azure to make the PoC, and all the nodes are in the same vnet.     When I start the Spark2 job (execution time remains between 1 and 3 seconds), everything seems to go fine the first 24 hours. Next it appears a kind of bottleneck upserting rows into Kudu, and the delay varies between 5 and more than 600 seconds.  Analizing the Kudu TS logs, I can see warning and error messages I can't interpret.     For instance:    W0622 08:21:50.793898 2051 negotiation.cc:313] Failed RPC negotiation. Trace:  0622 08:21:50.793636 (+ 0us) reactor.cc:579] Submitting negotiation task for server connection from 10.164.3.101:45662  0622 08:21:50.793817 (+ 181us) server_negotiation.cc:176] Beginning negotiation  0622 08:21:50.793819 (+ 2us) server_negotiation.cc:365] Waiting for connection header  0622 08:21:50.793862 (+ 43us) negotiation.cc:304] Negotiation complete: Network error: Server connection negotiation failed: server connection from xxx.xxx.xxx.xxx:45662: BlockingRecv error: Recv() got EOF from remote (error 108)  Metrics: {"server-negotiator.queue_time_us":151,"thread_start_us":86,"threads_started":1}     This warning appears several times on several TS, but I didn't notice any trouble with our network.        E0622 11:24:43.982810 28825 hybrid_clock.cc:395] Unable to read clock for last 0.774s: Service unavailable: Error reading clock. Clock considered unsynchronized     This error has begun after 5 days using Kudu day and night.        W0622 11:24:47.979626 66209 connection.cc:657] server connection from xxx.xxx.xxx.xxx:50238 send error: Network error: failed to write to TLS socket: Connection reset by peer  W0622 11:24:47.979672 66209 connection.cc:422] Connection torn down before Call kudu.tserver.TabletServerService.Write from xxx.xxx.xxx.xxx:50238 (ReqId={client: afb1a862dcca4b3494e27a011fe31d63, seq_no=4637933, attempt_no=1}) could send its response  W0622 11:24:47.979729 66209 connection.cc:188] Error closing socket: Network error: TlsSocket::Close: Success  W0622 11:24:47.981634 66209 connection.cc:511] server connection from xxx.xxx.xxx.xxx:33178 recv error: Network error: failed to read from TLS socket: Connection reset by peer (error 104)  W0622 11:24:47.981720 66209 connection.cc:188] Error closing socket: Network error: TlsSocket::Close: Broken pipe  W0622 11:24:48.057577 66210 connection.cc:657] server connection from xxx.xxx.xxx.xxx:41744 send error: Network error: failed to write to TLS socket: Broken pipe  W0622 11:24:48.057612 66210 connection.cc:422] Connection torn down before Call kudu.tserver.TabletServerService.Write from xxx.xxx.xxx.xxx:41744 (ReqId={client: 47ee414afec74fcb86f86a19b62d8f7d, seq_no=4631450, attempt_no=1}) could send its response  W0622 11:24:48.057677 66210 connection.cc:188] Error closing socket: Network error: TlsSocket::Close: Success  W0622 11:24:48.057819 66210 connection.cc:422] Connection torn down before Call kudu.tserver.TabletServerService.Write from xxx.xxx.xxx.xxx:41744 (ReqId={client: 47ee414afec74fcb86f86a19b62d8f7d, seq_no=4631470, attempt_no=1}) could send its response     In this case, I notice this warning on several TS, but I can't identify any trouble with my network (an independant vnet in Azure)        W0621 14:34:19.201867 125277 env_posix.cc:862] Time spent sync call for /data/kudu/0/data/data/008809b1a384414b93b9d95b6275217a.metadata: real 2.234s user 0.000s sys 0.000s  W0621 14:34:32.088975 125277 env_posix.cc:862] Time spent sync call for /data/kudu/0/data/data/56153c373b084500b3b1fd9310c30c53.data: real 1.786s user 0.000s sys 0.000s  W0621 14:37:23.982235 125277 env_posix.cc:862] Time spent sync call for /data/kudu/0/data/data/df02bb85c15c4652ae653d68e7b8d0ae.metadata: real 7.751s user 0.000s sys 0.000s  W0621 14:37:45.769942 125275 kernel_stack_watchdog.cc:191] Thread 118579 stuck at /data/jenkins/workspace/generic-package-centos64-7-0-impala/topdir/BUILD/kudu-1.6.0-cdh5.14.0/src/kudu/consensus/log.cc:664 for 557ms:  Kernel stack:  [<ffffffffc03eecc5>] do_get_write_access+0x285/0x4c0 [jbd2]  [<ffffffffc03eef27>] jbd2_journal_get_write_access+0x27/0x40 [jbd2]  [<ffffffffc0442d4b>] __ext4_journal_get_write_access+0x3b/0x80 [ext4]  [<ffffffffc04144a0>] ext4_reserve_inode_write+0x70/0xa0 [ext4]  [<ffffffffc0414523>] ext4_mark_inode_dirty+0x53/0x210 [ext4]  [<ffffffffc0417af0>] ext4_dirty_inode+0x40/0x60 [ext4]  [<ffffffff8122ffba>] __mark_inode_dirty+0x16a/0x270  [<ffffffff8121ee01>] update_time+0x81/0xd0  [<ffffffff8121eef0>] file_update_time+0xa0/0xf0  [<ffffffff811865a8>] __generic_file_aio_write+0x198/0x400  [<ffffffff81186869>] generic_file_aio_write+0x59/0xa0  [<ffffffffc040aeeb>] ext4_file_write+0xdb/0x470 [ext4]  [<ffffffff812020e9>] do_sync_readv_writev+0x79/0xd0  [<ffffffff81203cce>] do_readv_writev+0xce/0x260  [<ffffffff81203ef5>] vfs_writev+0x35/0x60  [<ffffffff812042f2>] SyS_pwritev+0xc2/0xf0  [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b  [<ffffffffffffffff>] 0xffffffffffffffff  User stack:  (thread did not respond: maybe it is blocking signals)     In this case, I just can't understand what the message means.        I'm sure something is wrong with my global configuration (Kudu or Network), but I can't understand what.     Any help would be great!  Thanks     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Kudu