Support Questions

Find answers, ask questions, and share your expertise

Unexpected Spill to Disk Activity

avatar
Explorer

Hi 

We are using CDH4.5.8 Impala 2.2. We are confused by some spill to disk activities and would like to understand why this is happening.
 
Here's a snippet of the profile, we had 128 GB memory available for Impala (YARN and Admission Controller are disabled) and stats are available for all tables.
Scratch directories are set to one per disk (23 HDD in total). No mem_limit is set with query option either.
The 4-node cluster has 3 Impalad running on 3 of the node and data nodes are co-located with the impalad.
 
BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 279 (279)
         - BlocksRecycled: 113 (113)
         - BufferedPins: 2 (2)
         - BytesWritten: 140.00 MB (146798191)
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 128.93 GB (138436362240)
         - PeakMemoryUsage: 654.59 MB (686391904)
         - TotalBufferWaitTime: 0ns
         - TotalEncryptionTime: 0ns
         - TotalIntegrityCheckTime: 0ns
         - TotalReadBlockTime: 122.868ms
 
We are just wondering why Impala spill data to disk when PeakMemoryUsage is only 654.59???
We are looking for any potential explanations for this scenario.
Thanks
 
link to profile:
passwod: 123456
 
1 ACCEPTED SOLUTION

avatar

You are most likely running into this bug with the aggregation: https://issues.cloudera.org/browse/IMPALA-2352

 

We fixed it in CDH5.5/Impala 2.3 but the change wasn't backported because it was deemed too risky for a maintenance release.

 

View solution in original post

1 REPLY 1

avatar

You are most likely running into this bug with the aggregation: https://issues.cloudera.org/browse/IMPALA-2352

 

We fixed it in CDH5.5/Impala 2.3 but the change wasn't backported because it was deemed too risky for a maintenance release.