For a UDA (user-defined aggregate function), I understand that the Impala execution units need to update() data within their own respective threads after calling init() for a locally persisting variable. I also understand that the accumulated data are merged between threads and/or nodes before being serialized and finalized on their way to the client. In more complicated cases, multi-variable structures seem to be housed within StringVal to ensure Impala knows about them.
It would be great to be able to use an object that can grows itself (map, set, etc.) within an aggregate function's execution thread, particularly within the update() and merge(). However, if the serialization component requires translating the contents to a string before the merge, it may or may not be worth the trouble.
Your intuition is right for the common case where query execution is entirely in memory - it happens after Merge() before the data needs to be sent over the network. However, if the data is being spilled to disk, the sequence of calls changes - Serialize() may be called before merge in order to write the intermediate value to disk to free up memory.
We don't have a flow chart or similar for this, but I think it's a good idea. I filed https://issues.apache.org/jira/browse/IMPALA-8405