Support Questions

Find answers, ask questions, and share your expertise

Not Able to Trigger "Spill to Disk" On Impala, Documentation Extremely Unclear

avatar
New Contributor

I am trying to trigger spilling to disk on impala (Impala Shell v3.2.0-cdh6.3.4 ) 
The cloudera documentation suggest the steps to do it here: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_scalability.html Under the section: "Testing performance implications of spilling to disk:"

 

I am extremely confused by the two seemingly contradictory statements there: "Set the MEM_LIMIT query option to a value that is smaller than the peak memory usage reported in the profile output. Do not specify a memory limit lower than reported in the profile output. "
So what should be the value of MEM_LIMIT ?
When I do set it to be lower than the peak memory usage, i get the following error : "Rejected query from pool root.centos: minimum memory reservation is greater than memory available to the query for buffer reservations. Memory reservation needed given the current...."

When I set the MEM_LIMIT to be higher, then I don't see any "SpilledPartitions" or "SpilledRuns"

counters, that is, not triggering any spilled to disk. 

Since the documentation is unclear, I want to know how I can trigger spill to disk functionality without the queries failing ?  

1 REPLY 1

avatar
Master Collaborator

Hi @PratCloudDev ,

 

"Spill to disk" happens when there is no enough memory available for a running query, Below is the example.

 

Suppose you are running a query which is using 10gb ( per node peak memory) of memory and in case this query needs 12 gb of memory in this situation spill to disk happen on the configured scratch directories.

you can see the directory by searching the "Impala Daemon Scratch Directories" property in the impala configurations.

If you do not want to fail the query then you need to make sure the configured scratch directories/disk has enough space to store spilling information, this can potentially be large amounts of data.

 

Check the query profile for "per node peak memory" it is the actual memory used for that query on each daemon, suppose if it is 15GB then set the MEM_LIMIT to 10gb or 12gb to see the spill to disk functionality. 

To understand why you are seeing the error[1] i need few details from your side.

 

1. Screenshot of impala admission control pool settings.

2. How much memory you are setting and seeing the below error[1]?

3. Which pool you are using to run the query?

4. If possible you can provide the query profile.

 

Regards,

Chethan YM

 

[1]. Rejected query from pool root.centos: minimum memory reservation is greater than memory available to the query for buffer reservations.