<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question MapReduce: FixedRecordReader - Partial record found at the end of split in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduce-FixedRecordReader-Partial-record-found-at-the-end/m-p/186420#M70508</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am not an expertise in Java and trying to analyse a FixedInputFormat and FixedRecordReader to customize in the project.&lt;/P&gt;&lt;P&gt;I copied both the classes from the below GitHub link and testing through Driver and mapper class &lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input"&gt;&lt;/A&gt;&lt;A href="https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input" target="_blank"&gt;https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input&lt;/A&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The input is a fixedlengthformat like this:&lt;BR /&gt;1234abcvd123mnfvds6722&lt;BR /&gt;6543abcad123aewert1234&lt;BR /&gt;While running this I get the error: Partial record found at the end of split.&lt;BR /&gt;The inputsplit has considered newline and calculated the splitlength as 46 instead of 44 and calculates 3 records instead of 2.&lt;BR /&gt;How could the newline character be avoided from the input split? I appreciate any help on this.&lt;/P&gt;&lt;P&gt;Thank you&lt;BR /&gt;&lt;A href="https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input"&gt;&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 31 Oct 2017 19:14:19 GMT</pubDate>
    <dc:creator>Eukrev</dc:creator>
    <dc:date>2017-10-31T19:14:19Z</dc:date>
    <item>
      <title>MapReduce: FixedRecordReader - Partial record found at the end of split</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduce-FixedRecordReader-Partial-record-found-at-the-end/m-p/186420#M70508</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am not an expertise in Java and trying to analyse a FixedInputFormat and FixedRecordReader to customize in the project.&lt;/P&gt;&lt;P&gt;I copied both the classes from the below GitHub link and testing through Driver and mapper class &lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input"&gt;&lt;/A&gt;&lt;A href="https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input" target="_blank"&gt;https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input&lt;/A&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The input is a fixedlengthformat like this:&lt;BR /&gt;1234abcvd123mnfvds6722&lt;BR /&gt;6543abcad123aewert1234&lt;BR /&gt;While running this I get the error: Partial record found at the end of split.&lt;BR /&gt;The inputsplit has considered newline and calculated the splitlength as 46 instead of 44 and calculates 3 records instead of 2.&lt;BR /&gt;How could the newline character be avoided from the input split? I appreciate any help on this.&lt;/P&gt;&lt;P&gt;Thank you&lt;BR /&gt;&lt;A href="https://github.com/apache/hadoop/tree/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input"&gt;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2017 19:14:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduce-FixedRecordReader-Partial-record-found-at-the-end/m-p/186420#M70508</guid>
      <dc:creator>Eukrev</dc:creator>
      <dc:date>2017-10-31T19:14:19Z</dc:date>
    </item>
    <item>
      <title>Re: MapReduce: FixedRecordReader - Partial record found at the end of split</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduce-FixedRecordReader-Partial-record-found-at-the-end/m-p/186421#M70509</link>
      <description>&lt;P&gt;Please have a look at the code of FixedInputFormat as provided in the &lt;A href="https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FixedLengthInputFormat.java"&gt;github&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;The basic criteria is that each record should be of the same length. 
What it means is each record in your file should be of length "fixedlengthinputformat.record.length" and record includes the delimiter too . &lt;/P&gt;&lt;P&gt;1. Please do understand &lt;A href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html"&gt;TextInputFormat &lt;/A&gt;was created for reading a file with records which are delimited. &lt;/P&gt;&lt;P&gt;2. There can be a  file which has multiple  "fixed length records" without any delimiter. &lt;/P&gt;&lt;P&gt;     We save on disk space as idea of delimiter is redundant in these files.&lt;/P&gt;&lt;P&gt;     Only record.length length determines where one record end and where the next starts . &lt;/P&gt;&lt;P&gt;     It looks a like a file with one  big row hence  we use  FixedInputFormat  .&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Two solutions :&lt;/P&gt;&lt;P&gt;1. provide fixedlengthinputformat.record.length in conf object and set it to 23.  Remove the delimiter in the map method. &lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;Configuration conf =newConfiguration(true);
conf.set("fs.default.name","file:///");
conf.setInt("fixedlengthinputformat.record.length",23);
job.setInputFormatClass(FixedLengthInputFormat.class);

&lt;/PRE&gt;&lt;P&gt;2. Use TextInputFormat ,  but it will do no records length checks that they are of same length , which you will have to do inside your map method. &lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2017 23:32:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduce-FixedRecordReader-Partial-record-found-at-the-end/m-p/186421#M70509</guid>
      <dc:creator>kgautam</dc:creator>
      <dc:date>2017-10-31T23:32:06Z</dc:date>
    </item>
    <item>
      <title>Re: MapReduce: FixedRecordReader - Partial record found at the end of split</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduce-FixedRecordReader-Partial-record-found-at-the-end/m-p/186422#M70510</link>
      <description>&lt;P&gt;Thanks for the suggestion. I am opting for the 2nd solution as the data is not one big continuous row and 1st solution did not work.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Nov 2017 01:37:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduce-FixedRecordReader-Partial-record-found-at-the-end/m-p/186422#M70510</guid>
      <dc:creator>Eukrev</dc:creator>
      <dc:date>2017-11-01T01:37:07Z</dc:date>
    </item>
  </channel>
</rss>

