Reply
Explorer
Posts: 11
Registered: ‎01-26-2016
Accepted Solution

total_network_send_timer and thrift_transmit_timer

[ Edited ]

Hi

For broadcast transmission, is it correct to say that, total_network_send_timer - thrift_transmit_timer is the time spend on locking and wait?

The following snappit is obtaind from our Impala cluster.

The data stream sender broadcasts data to 38 Impalad nodes.

Since the TransmitDataRPCTime (aggregated network transmission time for all sender threads) is only 15s667ms, therefore, my understanding is that the rest of the time, i.e., 9m4s - 15s667ms is the time spent on wait/locking/updating the network send time counter.

 

 

Fragment F23:
      Instance e040e6c84d494ed6:3255ca5dbe354ce4 (host=cdh-datanode-104.lufax.storage:22000):(Total: 9m12s, non-child: 8m52s, % non-child: 96.43%)
        Hdfs split stats (<volume id>:<# splits>/<split lengths>): 7:4/140.02 MB 4:4/340.69 MB 0:10/972.22 MB 1:7/776.15 MB 6:5/265.73 MB 9:9/964.64 MB 5:6/431.62 MB 8:3/328.13 MB 3:4/445.24 MB 2:9/742.39 MB 
         - AverageThreadTokens: 47.69 
         - BloomFilterBytes: 0
         - PeakMemoryUsage: 320.12 MB (335674080)
         - PerHostPeakMemUsage: 14.78 GB (15868058032)
         - PrepareTime: 176.046us
         - RowsProduced: 32.75M (32749654)
         - TotalCpuTime: 7h10m
         - TotalNetworkReceiveTime: 0.000ns
         - TotalNetworkSendTime: 9m4s
         - TotalStorageWaitTime: 1s531ms
        DataStreamSender (dst_id=55):(Total: 19s452ms, non-child: 19s452ms, % non-child: 100.00%)
           - BytesSent: 18.81 GB (20198277370)
           - NetworkThroughput(*): 1.20 GB/sec
           - OverallThroughput: 990.22 MB/sec
           - PeakMemoryUsage: 202.47 KB (207328)
           - RowsReturned: 32.75M (32749654)
           - SerializeBatchTime: 3s768ms
           - TransmitDataRPCTime: 15s677ms
           - UncompressedRowBatchSize: 32.46 GB (34850497072)

 

Cloudera Employee
Posts: 397
Registered: ‎07-29-2015

Re: total_network_send_timer and thrift_transmit_timer

I'm not the most knowledgeable person about this part of the code, but what you're saying is correct. One of the likely causes of long wait times is if the receiver is consuming data slower than the sender is sending it.

Announcements