Hello. I tryed to process image on hadoop and i using HVPI (https://github.com/xmpy/hvpi). It's open source. I using it to extract the frames from video but it not provide a output format to save the frames on HDFS. So i tryed to use the SequencefileOutputFormat to save the frames on HDFS. I readed a book and make some changes to the code work on hadoop 2.7.1. Apparently it's worked but when i tryed to recover the file in another mapreduce job i got EOFException. I thinked there is 3 possibily:
1- My SequenceFileOutputFormat is wrong and save a corrupted file;
2-My SequenceInputFormt is wrong;
3- The HVPI custom type is wrong.
The code to save a Sequence file:
public static void main(String[] args) throws Exception {
//判断输入参数
if (args.length != 2) {
System.err.println("Usage: <input path> <output path>");
System.exit(-1);
}
Configuration conf = new Configuration();
//设置Master URL
conf.set("fs.defaultFS","hdfs://evoido:9000");
Job job = new Job(conf, "MRVideoReader");
job.setJarByClass(MRVideoReader.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
//reduce task zero job is only map
job.setJobName("Sequence Writer Test");
job.setNumReduceTasks(0);
job.setInputFormatClass(VideoInputFormat.class);
job.setMapperClass(MRVideoReaderMapper.class);
job.setMapOutputKeyClass(Text.class);
//job.setMapOutputValueClass(Text.class);
job.setMapOutputValueClass(ImageWritable.class);
job.setReducerClass(MRVideoReaderReducer.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setJarByClass(MRVideoReader.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(ImageWritable.class);
//job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job,in);
SequenceFileOutputFormat.setOutputPath(job, out);
System.exit(job.waitForCompletion(true)?0:1);
}
The code to read from HDFS:
public static void main(String[] args) throws Exception {
//判断输入参数
if (args.length != 2) {
System.err.println("Usage: <input path> <output path>");
System.exit(-1);
}
Configuration conf = new Configuration();
//设置Master URL
conf.set("fs.defaultFS","hdfs://evoido:9000");
Job job = new Job(conf, "MRVideoReader");
job.setJarByClass(MRVideoReader.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
//reduce task zero job is only map
job.setJobName("Sequence Writer Test");
job.setNumReduceTasks(0);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setMapperClass(MRVideoReaderMapper.class);
job.setMapOutputKeyClass(Text.class);
//job.setMapOutputValueClass(Text.class);
job.setMapOutputValueClass(ImageWritable.class);
job.setReducerClass(MRVideoReaderReducer.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setJarByClass(MRVideoReader.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(ImageWritable.class);
//job.setOutputValueClass(Text.class);
SequenceFileInputFormat.addInputPath(job, in);
FileOutputFormat.setOutputPath(job, out);
System.exit(job.waitForCompletion(true)?0:1);
}
This is the code's piece from the custom type.
Maybe the readFilds or write methods are incorrect.
public void readFields(DataInput in) throws IOException {
int len = WritableUtils.readVInt(in);
System.out.println("Valor de len:"+len);
byte[] temp = new byte[len];
in.readFully(temp, 0, len);
ByteArrayInputStream byteStream = new ByteArrayInputStream(temp); // 输入流;
bufferedImage = ImageIO.read(byteStream); // 从输入流中,读取图片存入image中,而这里in可以为ByteArrayInputStream();
}
public void write(DataOutput out) throws IOException {
ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
ImageIO.write( bufferedImage, "png", byteOutputStream );
WritableUtils.writeVInt(out, byteOutputStream.size());
}