Support Questions
Find answers, ask questions, and share your expertise

ZipFileOutputFormat not giving output in .zip format mapreduce

ZipFileOutputFormat not giving output in .zip format mapreduce

Rising Star

i have an application where i read HBase and write records into files. Final output should be in .zip compressed format not hadoop supported format . For this i have used custom ZipFileOutputFormat to get records in .zip files.

Here is my implementation

ZipFileOutputFormat.setOutputPath(job, new Path(args[1]));

This is details of the ZipFileOutputFormat.class

  public class ZipFileOutputFormat extends FileOutputFormat<NullWritable, Text> {
  @Override
  public RecordWriter<NullWritable, Text> getRecordWriter(
  TaskAttemptContext job) throws IOException, InterruptedException {
  Path file = getDefaultWorkFile(job, ".zip");
  FileSystem fs = file.getFileSystem(job.getConfiguration());
  return new ZipRecordWriter(fs.create(file, false));
  }
  public static class ZipRecordWriter extends
  RecordWriter<NullWritable, Text> {
  protected ZipOutputStream zos;
  public ZipRecordWriter(FSDataOutputStream os) {
  zos = new ZipOutputStream(os);
  }
  @Override
  public void write(NullWritable key, Text value) throws IOException,
  InterruptedException {
  // TODO: create new ZipEntry & add to the ZipOutputStream (zos)
  }
  @Override
  public void close(TaskAttemptContext context) throws IOException,
  InterruptedException {
  zos.close();
  }
  }
  }

I am not getting any error but my output in still in r-000001 format .

Am i missing any configuration here ?

1 REPLY 1

Re: ZipFileOutputFormat not giving output in .zip format mapreduce

Expert Contributor

Trivial question: did you actually call job.setOutputFormatClass(ZipFileOutputFormat.class)?