Reply
New Contributor
Posts: 5
Registered: ‎08-29-2015

usercache is growing too fast

Hi,

 

We are trying to run a Spark job on YARN. The problem is that usercache direcory(yarn/nm/usercache/
) is growing too fast and it will fill all of the disk. The size of Hive table is around 70G and total free disk space around 300G. In usercache direcory there are a lot of big folders like blockmgr-b5b55c6f-ef8a-4359-93e4-9935f2390367.

 

My squestions are:

- Is it normal or something is wrong with my Spark or YARN?

- If these files are just cached files is there any way to limit the cache size?

 

Thanks for your help.

 

 

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: usercache is growing too fast

That sounds like it's filling up with blocks from the block manager,
which could mean you're persisting a bunch of RDDs to disk, or maybe
have a huge shuffle. The first step would be to figure out which of
those it may be and avoid the issue by caching in memory or designing
to avoid huge shuffles. You can consider upping
spark.shuffle.memoryFraction to use more memory for shuffling and
spill less.