Support Questions

Find answers, ask questions, and share your expertise

total_network_send_timer and thrift_transmit_timer

avatar
Explorer

Hi

For broadcast transmission, is it correct to say that, total_network_send_timer - thrift_transmit_timer is the time spend on locking and wait?

The following snappit is obtaind from our Impala cluster.

The data stream sender broadcasts data to 38 Impalad nodes.

Since the TransmitDataRPCTime (aggregated network transmission time for all sender threads) is only 15s667ms, therefore, my understanding is that the rest of the time, i.e., 9m4s - 15s667ms is the time spent on wait/locking/updating the network send time counter.

 

 

Fragment F23:
      Instance e040e6c84d494ed6:3255ca5dbe354ce4 (host=cdh-datanode-104.lufax.storage:22000):(Total: 9m12s, non-child: 8m52s, % non-child: 96.43%)
        Hdfs split stats (<volume id>:<# splits>/<split lengths>): 7:4/140.02 MB 4:4/340.69 MB 0:10/972.22 MB 1:7/776.15 MB 6:5/265.73 MB 9:9/964.64 MB 5:6/431.62 MB 8:3/328.13 MB 3:4/445.24 MB 2:9/742.39 MB 
         - AverageThreadTokens: 47.69 
         - BloomFilterBytes: 0
         - PeakMemoryUsage: 320.12 MB (335674080)
         - PerHostPeakMemUsage: 14.78 GB (15868058032)
         - PrepareTime: 176.046us
         - RowsProduced: 32.75M (32749654)
         - TotalCpuTime: 7h10m
         - TotalNetworkReceiveTime: 0.000ns
         - TotalNetworkSendTime: 9m4s
         - TotalStorageWaitTime: 1s531ms
        DataStreamSender (dst_id=55):(Total: 19s452ms, non-child: 19s452ms, % non-child: 100.00%)
           - BytesSent: 18.81 GB (20198277370)
           - NetworkThroughput(*): 1.20 GB/sec
           - OverallThroughput: 990.22 MB/sec
           - PeakMemoryUsage: 202.47 KB (207328)
           - RowsReturned: 32.75M (32749654)
           - SerializeBatchTime: 3s768ms
           - TransmitDataRPCTime: 15s677ms
           - UncompressedRowBatchSize: 32.46 GB (34850497072)

 

1 ACCEPTED SOLUTION

avatar

I'm not the most knowledgeable person about this part of the code, but what you're saying is correct. One of the likely causes of long wait times is if the receiver is consuming data slower than the sender is sending it.

View solution in original post

1 REPLY 1

avatar

I'm not the most knowledgeable person about this part of the code, but what you're saying is correct. One of the likely causes of long wait times is if the receiver is consuming data slower than the sender is sending it.