Member since
07-01-2016
3
Posts
0
Kudos Received
0
Solutions
07-10-2016
03:42 AM
I'm using hadoop 2.7.1. Just checked the files. One of them is 4.2 GB. Another comes close at 3.6 GB. The rest are below 3. If it is this bug, maybe i could upgrade to 2.7.2? Would it solve it?
... View more
07-09-2016
01:35 PM
My program was running fine for lesser inputs but when i increase the size of the input it seems that line 210 (context.nextKeyValue();) throws indexoutofbounds exception. This below is the setup method of the mapper. I call nextkeyvalue in there once because the first line of each file is a header. Splitting files is set to false because of the headers. Does it have to do with memory? how to solve this? @Override
protected void setup(Context context) throws IOException, InterruptedException
{
Configuration conf = context.getConfiguration();
DupleSplit fileSplit = (DupleSplit)context.getInputSplit();
//first line is header. Indicates the first digit of the solution.
context.nextKeyValue(); <---- LINE 210
URI[] uris = context.getCacheFiles();
int num_of_colors = Integer.parseInt(conf.get("num_of_colors"));
int order = fileSplit.get_order();
int first_digit = Integer.parseInt(context.getCurrentValue().toString());
//perm_path = conf.get(Integer.toString(num_of_colors - order -1));
int offset = Integer.parseInt(conf.get(Integer.toString(num_of_colors - order -1)));
uri = uris[offset];
Path perm_path = new Path(uri.getPath());
perm_name = perm_path.getName().toString();
String pair_variables = "";
for (int i=1; i<=num_of_colors; i++)
pair_variables += "X_" + i + "_" + (num_of_colors - order) + "\t";
for (int i=1; i<num_of_colors; i++)
pair_variables += "X_" + i + "_" + (num_of_colors - order - first_digit) + "\t";
pair_variables += "X_" + num_of_colors + "_" + (num_of_colors - order - first_digit);
context.write(new Text(pair_variables), null);
} Here's the error log: Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkBounds(Buffer.java:559)
at java.nio.ByteBuffer.get(ByteBuffer.java:668)
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:279)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:168)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at produce_data_hdfs$input_mapper.setup(produce_data_hdfs.java:210)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
... View more
- Tags:
- 2.7.1
07-01-2016
04:09 AM
I know that you can spawn two mappers of the same file if you type the addinputpath function twice with the same path, but I'd like the file to be processed slightly different each time. Specifically, I want each time to use different parameters that I passed through the Job class (with configuration.set/get). When the files are different I get the path/name of the file by using the context/inputsplit classes to achieve that, but now that they are the same I can't differentiate them. Any thoughts? Each mapper is a different maptask but i have no idea if i can use any info regarding the maptasks. Also I don't know the order the framework matches inputsplits to maptasks - it could be useful. Alternatively I could duplicate the file(using a different name), but that would be a waste of resources
... View more