Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Pig job spilling to disk

Pig job spilling to disk

Expert Contributor

My pig job is running for a long time and then runs out of space on disk. I was able to identify the job and the disk. 

 

This log file is huge. eventually this disk reaches 100% and job fails. This tmp file under /yarn is 200G . this node (Datanode and Nodemananger) is the where the one reducer is still running. 

 

/hadoop/sdl/yarn/nm/usercache/userxxx/appcache/application_1541521281307_37125/container_e367_1541521281307_37125_01_008848/tmp/pigbag5356870459906866829.tmp

 

How to manage this situation? why does it spill to local disk. 

1 REPLY 1
Highlighted

Re: Pig job spilling to disk

Expert Contributor

We tried adding these parameters and still see it dumps these bags which are lowetr than 1TB. 

 

pig.spill.size.threshold=1000000000000
pig.spill.gc.activation.size=1000000000000