Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MapReduce: FixedRecordReader - Partial record found at the end of split

Solved Go to solution
Highlighted

MapReduce: FixedRecordReader - Partial record found at the end of split

New Contributor

Hi,

I am not an expertise in Java and trying to analyse a FixedInputFormat and FixedRecordReader to customize in the project.

I copied both the classes from the below GitHub link and testing through Driver and mapper class

https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-...

The input is a fixedlengthformat like this:
1234abcvd123mnfvds6722
6543abcad123aewert1234
While running this I get the error: Partial record found at the end of split.
The inputsplit has considered newline and calculated the splitlength as 46 instead of 44 and calculates 3 records instead of 2.
How could the newline character be avoided from the input split? I appreciate any help on this.

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions

Re: MapReduce: FixedRecordReader - Partial record found at the end of split

Please have a look at the code of FixedInputFormat as provided in the github.

The basic criteria is that each record should be of the same length. What it means is each record in your file should be of length "fixedlengthinputformat.record.length" and record includes the delimiter too .

1. Please do understand TextInputFormat was created for reading a file with records which are delimited.

2. There can be a file which has multiple "fixed length records" without any delimiter.

We save on disk space as idea of delimiter is redundant in these files.

Only record.length length determines where one record end and where the next starts .

It looks a like a file with one big row hence we use FixedInputFormat .


Two solutions :

1. provide fixedlengthinputformat.record.length in conf object and set it to 23. Remove the delimiter in the map method.

<code>Configuration conf =newConfiguration(true);
conf.set("fs.default.name","file:///");
conf.setInt("fixedlengthinputformat.record.length",23);
job.setInputFormatClass(FixedLengthInputFormat.class);

2. Use TextInputFormat , but it will do no records length checks that they are of same length , which you will have to do inside your map method.

2 REPLIES 2

Re: MapReduce: FixedRecordReader - Partial record found at the end of split

Please have a look at the code of FixedInputFormat as provided in the github.

The basic criteria is that each record should be of the same length. What it means is each record in your file should be of length "fixedlengthinputformat.record.length" and record includes the delimiter too .

1. Please do understand TextInputFormat was created for reading a file with records which are delimited.

2. There can be a file which has multiple "fixed length records" without any delimiter.

We save on disk space as idea of delimiter is redundant in these files.

Only record.length length determines where one record end and where the next starts .

It looks a like a file with one big row hence we use FixedInputFormat .


Two solutions :

1. provide fixedlengthinputformat.record.length in conf object and set it to 23. Remove the delimiter in the map method.

<code>Configuration conf =newConfiguration(true);
conf.set("fs.default.name","file:///");
conf.setInt("fixedlengthinputformat.record.length",23);
job.setInputFormatClass(FixedLengthInputFormat.class);

2. Use TextInputFormat , but it will do no records length checks that they are of same length , which you will have to do inside your map method.

Re: MapReduce: FixedRecordReader - Partial record found at the end of split

New Contributor

Thanks for the suggestion. I am opting for the 2nd solution as the data is not one big continuous row and 1st solution did not work.

Don't have an account?
Coming from Hortonworks? Activate your account here