Support Questions
Find answers, ask questions, and share your expertise

How to get only one map output in MultipleOutput for Multiple mapper without reducer in hadoop

How to get only one map output in MultipleOutput for Multiple mapper without reducer in hadoop

Rising Star

Hi i have an application that reads records from HBase and writes into text files HBase table has 200 regions.

I am using MultipleOutputs in the mapper class to write into multiple files and i am making file name from the incoming records .

I am making 40 unique file names .

I am able to get records properly but my problem is that when mapreduce finishes it creates 40 files and also 2k extra files with proper name but appended with m-000 and so on.

This is because i have 200 regions and MultipleOutputs creates files for each mapper so 200 mapper and for each mapper there are 40 unique files so that is why it creates 40*200 files .

I don't know how to avoid this situation without custom partitioner .

Is there any way to force write records into belonging files only not to split into multiple files.

I have used custom partitioner class and its working fine but i don't want to use that as i am just reading from HBase and not doing reducer operation.

Also if any extra file name i have to create then i have to change my code also . Here is my mapper code

        public class DefaultMapper extends TableMapper<NullWritable, Text> {
        private Text text = new Text();
        MultipleOutputs<NullWritable, Text> multipleOutputs;
        String strName = "";

        @Override()
        public void setup(Context context) throws java.io.IOException, java.lang.InterruptedException {
            multipleOutputs = new MultipleOutputs<NullWritable, Text>(context);
        }
String FILE_NAME = new String(value.getValue(Bytes.toBytes(HbaseBulkLoadMapperConstants.COLUMN_FAMILY),
Bytes.toBytes(HbaseBulkLoadMapperConstants.FILE_NAME)));
        multipleOutputs.write(NullWritable.get(), new Text(text.toString()),FILE_NAME);
      
    }

No reducer class

This is how my output looks like ideally only one Japan.BUS.gz file should be created.Other files are very small files also 
    Japan.BUS-m-00193.gz
    Japan.BUS-m-00194.gz
    Japan.BUS-m-00195.gz
    Japan.BUS-m-00196.gz