Support Questions

Find answers, ask questions, and share your expertise

Pig user cache files are not automatically removed.

Expert Contributor

Hello community!

I've the following problem in my HDP 2.4 cluster, the pig user cache files stored in /tmp are not removed and are filling my HDFS filesystem. Is there any way to configure pig to automatically remove this files after finishing the jobs?

Thank you in advance!

1 ACCEPTED SOLUTION

Expert Contributor

Hi @Juan Manuel Nieto,

Generally /tmp directory mainly has temporary storage during MapReduce phases.

Mapreduce adds the intermediate data that is kept under /tmp. These files will be automatically cleared out when Mapreduce job execution completes.

Temporary files are also created by pig as it runs on Mapreduce phenomenon. Temporary files deletion happens at the end. Pig does not handle temporary files deletion if the script execution failed or killed. Then we have to handle the situation. This could be better handled by added the lines or changes in the script itself.

For further details I found an article here:

Hope that helps.

Thanks,

Sujitha

View solution in original post

1 REPLY 1

Expert Contributor

Hi @Juan Manuel Nieto,

Generally /tmp directory mainly has temporary storage during MapReduce phases.

Mapreduce adds the intermediate data that is kept under /tmp. These files will be automatically cleared out when Mapreduce job execution completes.

Temporary files are also created by pig as it runs on Mapreduce phenomenon. Temporary files deletion happens at the end. Pig does not handle temporary files deletion if the script execution failed or killed. Then we have to handle the situation. This could be better handled by added the lines or changes in the script itself.

For further details I found an article here:

Hope that helps.

Thanks,

Sujitha

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.