New Contributor
Posts: 1
Registered: ‎05-16-2014

Yarn MR2 Configuration



Although I'm not new to Hadoop, I am new to Cloudera and Hadoop 2.3.x / Yarn. 
I have this Java Map Reduce job that we are trying to migrate from our Hadoop 1.0.3 MapR cluster to a new CDH 5.0 cluster.


The job is using a custom InputDBFormat inheriting from DataDrivenDBInputFormat. When we wrote that custom input format we had to override the setConf method for our purpose:


public void setConf(Configuration conf) { ... }


That worked just fine in MapR, we were able to use that instance of the configuration object to retrieve all the information we need to connect to the database and split the resultset.


However running that same job in CDH 5.0 that conf instance only has the Hadoop default configuration elements and none of the ones we set in our MR run method like the input query or the bounding query etc...


When I log conf.get(INPUT_QUERY) in the InputDBFormat.setConf I get "null". 


However when I log that same value in the MR run method just before submitting the job I get the correct value.

In fact to make sure I am reading from the correct correct configuration instance I do:




Another puzzling fact is when I go to the Resource Manager UI to read the job configuration information I do find my added keys and values just fine there.


I am not sure why my custom InputTDFormat can't read those values. It's almost like once the job is shipped to the nodes, the tasks create their own Configuration instance instance of using the one from the Job object.


Again this was working just fine as is in Hadoop 1.0.3. 


Any idea?


Posts: 1,617
Kudos: 305
Solutions: 248
Registered: ‎07-31-2013

Re: Yarn MR2 Configuration

If my understanding of your statements is correct, your complaint is that the task-side (remote execution) InputFormat is not coming up with proper configuration elements. Or is this incorrect, and the failure you speak of is at the client/local/driver end?

Per the MapTask initialization code in MR2, we do pass in the job configuration during construction (so it is passed onto the setConf method if the class implements Configurable/Configured types):

MR1 too has the same initialisation method:

Can you share a reproducible implementation of the InputFormat class, something that can act as a test-case for us to inspect if there's a bug?