Support Questions
Find answers, ask questions, and share your expertise

Scratch file generation information

Contributor

Hello,

 

A simple question:

 

How can I know which queries generate Scratch files?

I'm inspecting the Impalad logs and I couldn't find any information about the scratch file generation.

 

Regards,

Silva

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Scratch file generation information

The Cloudera Manager queries page has the bytes spilled to disk as one of the metrics it tracks per query. Also in CM, there's a "Cluster utilization report" that has some aggregate information about how much data is spilled to disk over longer time windows. Also, if you're looking at the scratch files themselves the query ID is embedded in the file name (although that's an implementation detail and could change in the future).

View solution in original post

4 REPLIES 4

Re: Scratch file generation information

Re: Scratch file generation information

Contributor

But I need to know which specific queries spills into disk, generating the scratch files. Is possible to have that kind of information?.

Re: Scratch file generation information

The Cloudera Manager queries page has the bytes spilled to disk as one of the metrics it tracks per query. Also in CM, there's a "Cluster utilization report" that has some aggregate information about how much data is spilled to disk over longer time windows. Also, if you're looking at the scratch files themselves the query ID is embedded in the file name (although that's an implementation detail and could change in the future).

View solution in original post

Re: Scratch file generation information

Contributor
Thanks a lot for the info