Mentor
Created 07-01-2016 04:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While MultipleInputs was designed for such a thing, your requirement is
unique in that you need to process the same input 2x but with different
params each time. It seems a bit redundant to me given that you can do it
in a single task run vs. 2x the I/O cost…
But I believe the way you can solve your identifier problem is by writing
your own InputFormat wrapper over the existing InputFormat, which generates
special types of InputSplit objects (wrapper over regular FileSplit
classes). These input splits need to add in your identifiers as an extra
field, and you can extract and cast the same from your
context.getInputSplit() in the map-end to then differentiate the input.
unique in that you need to process the same input 2x but with different
params each time. It seems a bit redundant to me given that you can do it
in a single task run vs. 2x the I/O cost…
But I believe the way you can solve your identifier problem is by writing
your own InputFormat wrapper over the existing InputFormat, which generates
special types of InputSplit objects (wrapper over regular FileSplit
classes). These input splits need to add in your identifiers as an extra
field, and you can extract and cast the same from your
context.getInputSplit() in the map-end to then differentiate the input.