I have written a mapreduce application that can process the Input file as TextInputFile and process it as Avro output file through mapper. But before processing as TextInput file, I wanted to replace 2 byte binary in each record(separated by \n) with 2 space to maintain the record length of 60. This I am not able to do in TextInputFormat since the file is read as UTF-8 and the 2 bytes appear sometime as 1 char or 2 char or none. Hence, I couldn't go by position. This is a mainframe file and except this 2 bytes binary rest are as Text(readable).
I tested the same file reading as BufferedInputStream in Java. It was able to detect the 2 bytes binary based on positions. Since I have implemented the mapper(say mapper-2) for processing as TextInputFormat, I am thinking of writing one more mapper(mapper -1) to process this file as binary to replace the 2 bytes and then call mapper-2.
Since I have limited time I do not have the luxury of trying out options. So, could you suggest if there is a better approach than I think. Could you give some links for the suggestion.