We are using CDH4.5.8 Impala 2.2. We are confused by some spill to disk activities and would like to understand why this is happening.
Here's a snippet of the profile, we had 128 GB memory available for Impala (YARN and Admission Controller are disabled) and stats are available for all tables.
Scratch directories are set to one per disk (23 HDD in total). No mem_limit is set with query option either.
The 4-node cluster has 3 Impalad running on 3 of the node and data nodes are co-located with the impalad.
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 279 (279)
- BlocksRecycled: 113 (113)
- BufferedPins: 2 (2)
- BytesWritten: 140.00 MB (146798191)
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 128.93 GB (138436362240)
- PeakMemoryUsage: 654.59 MB (686391904)
- TotalBufferWaitTime: 0ns
- TotalEncryptionTime: 0ns
- TotalIntegrityCheckTime: 0ns
- TotalReadBlockTime: 122.868ms
We are just wondering why Impala spill data to disk when PeakMemoryUsage is only 654.59???
We are looking for any potential explanations for this scenario.
Thanks
link to profile:
passwod: 123456