New Contributor
Posts: 2
Registered: ‎12-19-2013

temp space best practices????

We’ve had some MR jobs consume temp space on data nodes and we want to prevent that in the future.
There is conflicting information on the internet.


  • What is the best practice for where data node temp space should reside? (separate physical disks from HDFS physical disks? Etc.)
  • How much temp space should be reserved for MR as a % of HDFS space? (for example is there a best practice target?)
  • What are MR best practices around temp space? (For example we can instruct users to shrink reduce steps but sometimes this is hard to predict for analytics jobs –  are there any best practices configurations that can help?)
  • How do file sizes impact usage of temp space especially with compressed files on HDFS?
Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: temp space best practices????

By "temp space on datanodes" do you mean they populated the directory /tmp on HDFS or the /tmp local directory on each of your DataNodes?