Support Questions
Find answers, ask questions, and share your expertise
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

temp space best practices????


temp space best practices????

New Contributor

We’ve had some MR jobs consume temp space on data nodes and we want to prevent that in the future.
There is conflicting information on the internet.


  • What is the best practice for where data node temp space should reside? (separate physical disks from HDFS physical disks? Etc.)
  • How much temp space should be reserved for MR as a % of HDFS space? (for example is there a best practice target?)
  • What are MR best practices around temp space? (For example we can instruct users to shrink reduce steps but sometimes this is hard to predict for analytics jobs –  are there any best practices configurations that can help?)
  • How do file sizes impact usage of temp space especially with compressed files on HDFS?

Re: temp space best practices????

Cloudera Employee
By "temp space on datanodes" do you mean they populated the directory /tmp on HDFS or the /tmp local directory on each of your DataNodes?