Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.
The HarFileSystem class contains a cache mechanism caching MetaData objects. This cache only adds entries (and refreshes them when the cached data is modified on HDFS) but it never removes old cached entries (and never removes cached metadata for archives that are deleted since they were accessed...
This leads to high memory use and no workaround other than patching the code. The provisional change i made locally changes the cache into a LRU cache limited to 10 etries (not nice but it works so far without dumping core due to out of memory exceptions :-) )
// *original code* private static final Map<URI, HarMetaData> harMetaCache = new ConcurrentHashMap<URI, HarMetaData>();