Reply
Contributor
Posts: 66
Registered: ‎12-30-2015

do you know impala-server.io.mgr.cached-file-handles-miss-count value in impala-server metric?

Hello,

 

I have a qusetion about these values in impala-server metric.

 

I set `max_cached_file_handles` to 10,000 

max_cached_file_handles (uint64)Maximum number of HDFS file handles that will be cached. Disabled if set to 0.010000

 

However, there are still missed `impala-server.io.mgr.cached-file-handles-miss-count`.

impala-server.io.mgr.cached-file-handles-hit-count668Number of cache hits for cached HDFS file handles
impala-server.io.mgr.num-cached-file-handles253Number of currently cached HDFS file handles in the IO manager.
impala-server.io.mgr.cached-file-handles-miss-count1467Number of cache misses for cached HDFS file handles

 

Can you tell me how to reduce the miss-count?

 

Thank you

Gatsby

New Contributor
Posts: 4
Registered: ‎01-03-2017

Re: do you know impala-server.io.mgr.cached-file-handles-miss-count value in impala-server metric?

Cache misses are common when the cache is warming up. When Impala requests a file handle for the first time, it will not be the cache and Impala needs to open the file handle. It looks like this is what is happening in your case, as the cache is not full. Over time, the ratio of hits to misses will go up as the cache contains more of the file handles that Impala is accessing.

 

 

A few things to know:

1. If the cache gets full, then the cache will start evicting the least recently used file handles. If a workload then needs file handle that was evicted, then it will cause a miss again. This is not happening in your case, as the cache is not full.

2. Impala will often have multiple file handles open for the same file, because it is accessing the file from multiple places in multiple threads. This means that the cache will need multiple file handles for the same file. So, the initial number of misses as the cache warms up can exceed the number of files that you are accessing.

 

I hope this helps.

 

Thanks,

Joe

Contributor
Posts: 66
Registered: ‎12-30-2015

Re: do you know impala-server.io.mgr.cached-file-handles-miss-count value in impala-server metric?

Joe,

 

Thank you for your comment.

Your comment really help me confirm and understand how file-handle-cache works.

 

Like you said, over time, the hit-count is going up and miss-cout is becoming stable :) 

 

Screen Shot 2017-04-28 at 10.58.27 AM.png

Contributor
Posts: 66
Registered: ‎12-30-2015

Re: do you know impala-server.io.mgr.cached-file-handles-miss-count value in impala-server metric?

Joe,

 

By the way, do you think one cached file handle is used by multiple threads?

 

Gatsby

New Contributor
Posts: 4
Registered: ‎01-03-2017

Re: do you know impala-server.io.mgr.cached-file-handles-miss-count value in impala-server metric?

Gatsby,

 

 

A query gets a file handle when it starts processing a file and holds the handle until it is done with the file. Only one thread issues IO on a file handle at a time. When a file handle returns to the cache, it can be picked up by any thread that needs to access the that file.

 

 

 

Thanks,

Joe

Contributor
Posts: 66
Registered: ‎12-30-2015

Re: do you know impala-server.io.mgr.cached-file-handles-miss-count value in impala-server metric?

Joe,

 

ah. i see

so, at any given time, a file handle is used by only one thread.

it means a file handle is not used by multiple threads at the same time.

 

 

I'm very wonrdering how you know about this very well :)

 

Thank you very much

 

Gatsby

 

Announcements