Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Yarn MR2 Configuration

Yarn MR2 Configuration

New Contributor



Although I'm not new to Hadoop, I am new to Cloudera and Hadoop 2.3.x / Yarn. 
I have this Java Map Reduce job that we are trying to migrate from our Hadoop 1.0.3 MapR cluster to a new CDH 5.0 cluster.


The job is using a custom InputDBFormat inheriting from DataDrivenDBInputFormat. When we wrote that custom input format we had to override the setConf method for our purpose:


public void setConf(Configuration conf) { ... }


That worked just fine in MapR, we were able to use that instance of the configuration object to retrieve all the information we need to connect to the database and split the resultset.


However running that same job in CDH 5.0 that conf instance only has the Hadoop default configuration elements and none of the ones we set in our MR run method like the input query or the bounding query etc...


When I log conf.get(INPUT_QUERY) in the InputDBFormat.setConf I get "null". 


However when I log that same value in the MR run method just before submitting the job I get the correct value.

In fact to make sure I am reading from the correct correct configuration instance I do:




Another puzzling fact is when I go to the Resource Manager UI to read the job configuration information I do find my added keys and values just fine there.


I am not sure why my custom InputTDFormat can't read those values. It's almost like once the job is shipped to the nodes, the tasks create their own Configuration instance instance of using the one from the Job object.


Again this was working just fine as is in Hadoop 1.0.3. 


Any idea?



Re: Yarn MR2 Configuration

Master Guru
If my understanding of your statements is correct, your complaint is that the task-side (remote execution) InputFormat is not coming up with proper configuration elements. Or is this incorrect, and the failure you speak of is at the client/local/driver end?

Per the MapTask initialization code in MR2, we do pass in the job configuration during construction (so it is passed onto the setConf method if the class implements Configurable/Configured types):

MR1 too has the same initialisation method:

Can you share a reproducible implementation of the InputFormat class, something that can act as a test-case for us to inspect if there's a bug?
Don't have an account?
Coming from Hortonworks? Activate your account here