We are one specific query, "insert overwrite select * from a partitions" that usually take < 2min.
Occasionaly , it takes over 5 min for unknown reason.
Looking at the query profile, we see the following below.
Throughput of < then 600kb sec is unexpected since we have 10GB network...
So what could explain this?
Is it releated to ThriftTransmitTime?
DataStreamSender (dst_id=1) (5.0m)
Impala executes queries in a pipelined manner which means that if an operator further up the tree is slow, it will create back-pressure slow down all other operators in the pipeline. ThriftTransmitTime includes time spent waiting for upstream operators to process their queued input.
So it's possible there's some network issue (we would expect to get much higher throughput than that if the network is healthy) but its probably the upstream insert that is slow.
Thanks for quick help.
One last question . This snippet is frmo the same query as my initla question and I am wondering if this is the cause. EncodeTimer of 4min ... Does this rpresent the time taken to encode to parquet?
And could it be where the slow down is?
Thanks for the help.
4min for parquet encoding (800mb) seems high. Luckily its infrequent.
We are on impala 2.2 . Is this something that got improved in more recent version?