Support Questions
Find answers, ask questions, and share your expertise

How to create zip files from ZipOutputStream in hadoop mapreduce

How to create zip files from ZipOutputStream in hadoop mapreduce

Rising Star

My mapreduce has to read records from HBase and need to write into zip files. Client has asked specifically that the reducer output files should be .zip files only . For this i have written ZipFileOutputFormat wrapper to compress the records and write into the zip files . All seems ok but there is one problem . 1. zip file is getting created for each key . Inside my output file i can see many output file and those are separate file per row key . I dont know how to combined it inside the zip file . Here is my implementation of the ZipFileOutputFormat.java

  @Override
       public void write(K key, V value) throws IOException {
         String fname = null;
         if (key instanceof BytesWritable) {
           BytesWritable bk = (BytesWritable) key;
           fname = new String(bk.getBytes(), 0, bk.getLength());
         } else {
           fname = key.toString();
         }
         ZipEntry ze = new ZipEntry(fname);
         zipOut.closeEntry();
         zipOut.putNextEntry(ze);
         if (value instanceof BytesWritable) {
           zipOut.write(((BytesWritable) value).getBytes(), 0,((BytesWritable) value).getLength());
         } else {
           zipOut.write(value.toString().getBytes());
         }
       }