Reply
Expert Contributor
Posts: 76
Registered: ‎05-09-2017

Pig job spilling to disk

My pig job is running for a long time and then runs out of space on disk. I was able to identify the job and the disk. 

 

This log file is huge. eventually this disk reaches 100% and job fails. This tmp file under /yarn is 200G . this node (Datanode and Nodemananger) is the where the one reducer is still running. 

 

/hadoop/sdl/yarn/nm/usercache/userxxx/appcache/application_1541521281307_37125/container_e367_1541521281307_37125_01_008848/tmp/pigbag5356870459906866829.tmp

 

How to manage this situation? why does it spill to local disk. 

Highlighted
Expert Contributor
Posts: 76
Registered: ‎05-09-2017

Re: Pig job spilling to disk

We tried adding these parameters and still see it dumps these bags which are lowetr than 1TB. 

 

pig.spill.size.threshold=1000000000000
pig.spill.gc.activation.size=1000000000000

Announcements
New solutions