New Contributor
Posts: 2
Registered: ‎10-25-2013

CDH4.4: Memory leak in HarFileSystem

[ Edited ]

The HarFileSystem class contains a cache mechanism caching MetaData objects. This cache only adds entries (and refreshes them when the cached data is modified on HDFS) but it never removes old cached entries (and never removes cached metadata for archives that are deleted since they were accessed...

This leads to high memory use and no workaround other than patching the code. The provisional change i made locally changes the
cache into a LRU cache limited to 10 etries (not nice but it works so far without dumping core due to out of memory exceptions :-) )

// *original code*
    private static final Map<URI, HarMetaData> harMetaCache =
          new ConcurrentHashMap<URI, HarMetaData>();

//  *workaround*
    private final Map<URI, HarMetaData> harMetaCache = Collections.synchronizedMap(new LinkedHashMap<URI, HarMetaData>() {
        protected boolean removeEldestEntry(Map.Entry<URI, HarMetaData> eldest) {
            return 10 < size();

I hope this (and the other issue I posted gets fixed in a later release.

regards, Andre


Our community is getting a little larger. And a lot better.

Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.