Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Scratch file generation information

avatar
Contributor

Hello,

 

A simple question:

 

How can I know which queries generate Scratch files?

I'm inspecting the Impalad logs and I couldn't find any information about the scratch file generation.

 

Regards,

Silva

1 ACCEPTED SOLUTION

avatar
The Cloudera Manager queries page has the bytes spilled to disk as one of the metrics it tracks per query. Also in CM, there's a "Cluster utilization report" that has some aggregate information about how much data is spilled to disk over longer time windows. Also, if you're looking at the scratch files themselves the query ID is embedded in the file name (although that's an implementation detail and could change in the future).

View solution in original post

4 REPLIES 4

avatar

avatar
Contributor

But I need to know which specific queries spills into disk, generating the scratch files. Is possible to have that kind of information?.

avatar
The Cloudera Manager queries page has the bytes spilled to disk as one of the metrics it tracks per query. Also in CM, there's a "Cluster utilization report" that has some aggregate information about how much data is spilled to disk over longer time windows. Also, if you're looking at the scratch files themselves the query ID is embedded in the file name (although that's an implementation detail and could change in the future).

avatar
Contributor
Thanks a lot for the info