Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

CDH4.4: Memory leak in HarFileSystem

CDH4.4: Memory leak in HarFileSystem

New Contributor

The HarFileSystem class contains a cache mechanism caching MetaData objects. This cache only adds entries (and refreshes them when the cached data is modified on HDFS) but it never removes old cached entries (and never removes cached metadata for archives that are deleted since they were accessed...

This leads to high memory use and no workaround other than patching the code. The provisional change i made locally changes the
cache into a LRU cache limited to 10 etries (not nice but it works so far without dumping core due to out of memory exceptions :-) )

// *original code*
    private static final Map<URI, HarMetaData> harMetaCache =
          new ConcurrentHashMap<URI, HarMetaData>();

//  *workaround*
    private final Map<URI, HarMetaData> harMetaCache = Collections.synchronizedMap(new LinkedHashMap<URI, HarMetaData>() {
        protected boolean removeEldestEntry(Map.Entry<URI, HarMetaData> eldest) {
            return 10 < size();

I hope this (and the other issue I posted gets fixed in a later release.

regards, Andre