Reply
Explorer
Posts: 16
Registered: ‎11-25-2015

slow reported throughput in DataStreamSender

We are one specific query, "insert overwrite select * from a partitions" that usually take < 2min.

Occasionaly , it takes over 5 min for unknown reason.

Looking at the query profile, we see the following below. 

Throughput of < then 600kb sec is unexpected since we have 10GB network...

So what could explain this? 

Is it releated to ThriftTransmitTime?

Thanks

 

DataStreamSender (dst_id=1) (5.0m)

  • AsyncTotalTime: 0ns
  • BytesSent: 169.9 MiB
  • InactiveTotalTime: 0ns
  • NetworkThroughput(*): 595.0 KiB/s
  • OverallThroughput: 585.7 KiB/s
  • PeakMemoryUsage: 72.0 KiB
  • SerializeBatchTime: 2.77s
  • ThriftTransmitTime(*): 4.9m
  • TotalTime: 5.0m
  • UncompressedRowBatchSize: 547.1 MiB
Cloudera Employee
Posts: 433
Registered: ‎07-29-2015

Re: slow reported throughput in DataStreamSender

Impala executes queries in a pipelined manner which means that if an operator further up the tree is slow, it will create back-pressure slow down all other operators in the pipeline. ThriftTransmitTime includes time spent waiting for upstream operators to process their queued input.


So it's possible there's some network issue (we would expect to get much higher throughput than that if the network is healthy) but its probably the upstream insert that is slow.

Explorer
Posts: 16
Registered: ‎11-25-2015

Re: slow reported throughput in DataStreamSender

Thanks for quick help.

One last question . This snippet is frmo the same query as my initla question and I am wondering if this is the cause. EncodeTimer of 4min ... Does this rpresent the time taken to encode to parquet?

And could it be where the slow down is?

 

HdfsTableSink (4.4m)

  • AsyncTotalTime: 0ns
  • BytesWritten: 875.0 MiB
  • CompressTimer: 9.18s
  • EncodeTimer: 4.1m
  • FilesCreated: 4
  • FinalizePartitionFileTimer: 4.02s
  • HdfsWriteTimer: 3.93s
  • InactiveTotalTime: 0ns
  • PartitionsCreated: 1
  • PeakMemoryUsage: 319.9 MiB
  • RowsInserted: 3,720,826
  • TmpFileCreateTimer: 55ms
  • TotalTime: 4.4m
Cloudera Employee
Posts: 433
Registered: ‎07-29-2015

Re: slow reported throughput in DataStreamSender

That's exactly right, it looks like encoding the parquet file is taking all the time.

Highlighted
Explorer
Posts: 16
Registered: ‎11-25-2015

Re: slow reported throughput in DataStreamSender

Thanks for the help.

4min for parquet encoding (800mb) seems high. Luckily its infrequent.

We are on impala 2.2 . Is this something that got improved in more recent version?