Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎10-28-2013

Is there a way to send Task/Input specific parameters to a mapper?

I am using multiple inputs to do user data integration from a variety of sources and there is meta-information that needs to be added to records from legacy systems that is common per input but not available in the record itself. How can I send parameters to a specific mapper using multiple inputs so that this information can be retreived during the setup phase of the mapper for all mappers of a given input?

Posts: 416
Topics: 51
Kudos: 89
Solutions: 49
Registered: ‎06-26-2013

Re: Is there a way to send Task/Input specific parameters to a mapper?

You might want to consider using the Distributed Cache mechanism to distribute your metadata files out to the working directory of each map task and then just access it from your application code directly.

New Contributor
Posts: 3
Registered: ‎10-28-2013

Re: Is there a way to send Task/Input specific parameters to a mapper?

How do I specify which task gets which distributed cache file?

 

New Contributor
Posts: 3
Registered: ‎10-28-2013

Re: Is there a way to send Task/Input specific parameters to a mapper?

My problem specifically is specifying which mapper gets what metadata. Large chunks of the data I need to specify use the same mapper but have different configuration settings that are specific to the input data.
Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: Is there a way to send Task/Input specific parameters to a mapper?

You can achieve this by using an overloaded InputFormat+RecordReader, which lets you grab an absolute path URI from the Split object and use it as the base of the configuration prefix.

If your data is rather large to be passed via Configuration, then your Configuration can instead carry HDFS paths of the metadata locations pertaining to each unique input, and you can grab the path and read it afterward as desired.