Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Monitoring Disk-to-spill from Cloudera Manager


Hello Team,


As per Impala release notes for Impala 2.5, 


Spill-to-disk feature now always recommended. In earlier releases, the spill-to-disk feature could be turned off using a pair of configuration settings,
enable_partitioned_aggregation=false and enable_partitioned_hash_join=false.

The latest improvements in the spill-to-disk mechanism, and related features that interact with it, make this feature robust enough that disabling it is now no longer needed or supported. In particular, some new features in Impala 2.5 and higher do not work when the spill-to-disk feature is disabled.


If spill-to-disk is enabled, is there an option to monitor the spill-to-disk instances so that I can monitor the query that is causing it.



@Hrishi1did you consider setting a default SCRATCH_LIMIT at the resource pool level so that queries will fail if they spill too much data? I know a lot of cluster admins do things like that to prevent runaway queries, and also so that users will come to them if they're trying to run big queries instead of them having to contact users.

I understand that it's not exactly what you're looking for but I've seen people have success with it.


Hello Tim,


Thank you for your help in this thread.


Yes. For now, I have set the scratch limit for the specific resource pool. I set it to zero to prevent disk-to-spill and created a trigger to test whether it is working or not [IF (SELECT queries_spilled_memory_rate WHERE serviceName=$SERVICENAME AND max(queries_spilled_memory_rate) > 1) DO health:concerning]. Ideally, the trigger should not fire since the scratch limit is set to zero.


For your advice.