Hive is picking up blocks from these 4 DNs. Files on 1 DN are combined into 1 task. If a maxSplitSize is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize is equal to the block size, then this class is similar to the default splitting behavior in Hadoop: each block is a locally processed split.
The reason it has picked the first block location for each blocks while combining is any Hadoop Client will use the first block location and will consider the next only if reading the first fails. Usually NameNode will return the block locations of a block sorted based upon the distance between the client and location. NameNode will give all block locations but CombineHiveInputFormat / Hadoop Client / MapReduce Program uses the first block location.