Support Questions

Eukrev · ‎10-31-2017

Hi,

I am not an expertise in Java and trying to analyse a FixedInputFormat and FixedRecordReader to customize in the project.

I copied both the classes from the below GitHub link and testing through Driver and mapper class

https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-...

The input is a fixedlengthformat like this:
1234abcvd123mnfvds6722
6543abcad123aewert1234
While running this I get the error: Partial record found at the end of split.
The inputsplit has considered newline and calculated the splitlength as 46 instead of 44 and calculates 3 records instead of 2.
How could the newline character be avoided from the input split? I appreciate any help on this.

Thank you

kgautam · ‎10-31-2017

Please have a look at the code of FixedInputFormat as provided in the github.

The basic criteria is that each record should be of the same length. What it means is each record in your file should be of length "fixedlengthinputformat.record.length" and record includes the delimiter too .

1. Please do understand TextInputFormat was created for reading a file with records which are delimited.

2. There can be a file which has multiple "fixed length records" without any delimiter.

We save on disk space as idea of delimiter is redundant in these files.

Only record.length length determines where one record end and where the next starts .

It looks a like a file with one big row hence we use FixedInputFormat .

Two solutions :

1. provide fixedlengthinputformat.record.length in conf object and set it to 23. Remove the delimiter in the map method.

<code>Configuration conf =newConfiguration(true);
conf.set("fs.default.name","file:///");
conf.setInt("fixedlengthinputformat.record.length",23);
job.setInputFormatClass(FixedLengthInputFormat.class);

2. Use TextInputFormat , but it will do no records length checks that they are of same length , which you will have to do inside your map method.

View solution in original post

kgautam · ‎10-31-2017