My pig job is running for a long time and then runs out of space on disk. I was able to identify the job and the disk.
This log file is huge. eventually this disk reaches 100% and job fails. This tmp file under /yarn is 200G . this node (Datanode and Nodemananger) is the where the one reducer is still running.
How to manage this situation? why does it spill to local disk.
We tried adding these parameters and still see it dumps these bags which are lowetr than 1TB.