Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

usercache is growing too fast

usercache is growing too fast

New Contributor

Hi,

 

We are trying to run a Spark job on YARN. The problem is that usercache direcory(yarn/nm/usercache/
) is growing too fast and it will fill all of the disk. The size of Hive table is around 70G and total free disk space around 300G. In usercache direcory there are a lot of big folders like blockmgr-b5b55c6f-ef8a-4359-93e4-9935f2390367.

 

My squestions are:

- Is it normal or something is wrong with my Spark or YARN?

- If these files are just cached files is there any way to limit the cache size?

 

Thanks for your help.

 

 

1 REPLY 1

Re: usercache is growing too fast

Master Collaborator
That sounds like it's filling up with blocks from the block manager,
which could mean you're persisting a bunch of RDDs to disk, or maybe
have a huge shuffle. The first step would be to figure out which of
those it may be and avoid the issue by caching in memory or designing
to avoid huge shuffles. You can consider upping
spark.shuffle.memoryFraction to use more memory for shuffling and
spill less.