Support Questions
Find answers, ask questions, and share your expertise

How do you set the map reduce spill file location in CDH 5.10

New Contributor


I want to define the directories that should be used for intermediate spill file location in CDH 5.10.  I believe the parameter is mapreduce.cluster.local.dir. I can't find it in the UI. I believe that it winds up in mapred-site.xml, but this file is automatically generated from Cloudera Manager.

 

Where in the UI is this parameter defined ?

 

Also, I believe that when the listed file systems fill up, the job will fail. Is there a way to make it use regular HDFS space in this case ?

2 REPLIES 2

Re: How do you set the map reduce spill file location in CDH 5.10

Champion

No there is no way to have it use HDFS.  The primary reason is that the access pattern is small files and small updates and random.  HDFS would be ill-suited to do this on a consistent basis and at scale.

 

I am curious about this now as there seems to be some confusion.  I thought yarn.nodemanager.local-dirs determine it as container files are localized there but mapreduce.cluster.local.dir explicitly states that it is for the intermediate data.  I can't find a CM settings for the later.  The default ${hadoop.tmp.dir}, which is the start of the latter, is /tmp/hadoop-${username}.  The full path should be /tmp/hadoop-${username}/mapred/local but I don't see this on any nodes.  I do have a /tmp/hadoop-${username}/s3a for temporary data from S3 copy jobs.

 

I may try to test this out later and will get back on my findings.

 

I would test by trying to set it at the job level and seeing if the job takes it.  Then I would try different ACS to see which one fits.  My guess is either of these, the first being if you can set it an runtime of the job and the later if that is not possible so that all roles have the settings.

 

MapReduce Client Advanced Configuration Snippet (Safety Valve) for mapred-site.xml

YARN Service MapReduce Advanced Configuration Snippet (Safety Valve)

Re: How do you set the map reduce spill file location in CDH 5.10

Contributor

Hi all,

 

For MR1, search for this configuration in CM:

TaskTracker Local Data Directories
mapred.local.dir

 

For YARN w/MR2, search for this configuration in CM:

NodeManager Local Directories
yarn.nodemanager.local-dirs
 
Hope that helps!