New Contributor
Posts: 3
Registered: ‎07-01-2016

Hadoop map-reduce indexoutofbounds

My program was running fine for lesser inputs but when i increase the size of the input it seems that line 210 (context.nextKeyValue();) throws indexoutofbounds exception. This below is the setup method of the mapper. I call nextkeyvalue in there once because the first line of each file is a header. Splitting files is set to false because of the headers. Does it have to do with memory? how to solve this?

    protected void setup(Context context) throws IOException, InterruptedException
        Configuration conf = context.getConfiguration();
        DupleSplit fileSplit = (DupleSplit)context.getInputSplit();
        //first line is header. Indicates the first digit of the solution. 
        context.nextKeyValue(); <---- LINE 210
        URI[] uris = context.getCacheFiles();

        int num_of_colors = Integer.parseInt(conf.get("num_of_colors"));
        int order = fileSplit.get_order();
        int first_digit = Integer.parseInt(context.getCurrentValue().toString());

        //perm_path = conf.get(Integer.toString(num_of_colors - order -1));
        int offset = Integer.parseInt(conf.get(Integer.toString(num_of_colors - order -1)));
        uri = uris[offset];
        Path perm_path = new Path(uri.getPath());
            perm_name = perm_path.getName().toString();

        String pair_variables = "";
        for (int i=1; i<=num_of_colors; i++)
            pair_variables += "X_" + i + "_" + (num_of_colors - order) + "\t";
        for (int i=1; i<num_of_colors; i++)
            pair_variables += "X_" + i + "_" + (num_of_colors - order - first_digit) + "\t";
        pair_variables += "X_" + num_of_colors + "_" + (num_of_colors - order - first_digit);
        context.write(new Text(pair_variables), null);

Here's the error log:

Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkBounds(
at java.nio.ByteBuffer.get(
at java.nio.DirectByteBuffer.get(
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(
at org.apache.hadoop.util.LineReader.readDefaultLine(
at org.apache.hadoop.util.LineReader.readLine(
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(
at produce_data_hdfs$input_mapper.setup(
at org.apache.hadoop.mapred.MapTask.runNewMapper(
at org.apache.hadoop.mapred.YarnChild$
at Method)
at org.apache.hadoop.mapred.YarnChild.main(

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Cloudera Employee
Posts: 55
Registered: ‎03-07-2016

Re: Hadoop map-reduce indexoutofbounds

Which version are you running? There is a known bug in 5.7 that can cause this issue if the split is big enough (> 4GB I think).

New Contributor
Posts: 3
Registered: ‎07-01-2016

Re: Hadoop map-reduce indexoutofbounds

I'm using hadoop 2.7.1.

Just checked the files. One of them is 4.2 GB. Another comes close at 3.6 GB. The rest are below 3.

If it is this bug, maybe i could upgrade to 2.7.2? Would it solve it?

Cloudera Employee
Posts: 55
Registered: ‎03-07-2016

Re: Hadoop map-reduce indexoutofbounds

I guess it is the 4.2 GB file that have triggered the bug. The fix is in 2.7.3 or 2.8.0