question Re: How to set No of output file per reducer in Custom Partitioner Hadoop in Archives of Support Questions (Read Only)

How to set No of output file per reducer in Custom Partitioner Hadoop

sudarshankumar_ — Wed, 21 Dec 2016 18:50:07 GMT

Hi ,

I have implemented custom partition based on my logic .and i am able to get files also properly .But because of the condition some of the reducer is having very huge data and that leads to delay in reducer phase.

So is there any way so that i can create many small files inside one reducer output file .

Here is my custom partioner

public class MyPartioner extends Partitioner<Text, IntWritable> {
  public int getPartition(Text key, IntWritable value, int setNumRedTask) {
   String str = key.toString();
   if (str.contains("Japan|2014")) {
    return 0;
   } else if (str.contains("Japan|2013")) {
    return 1;
   }  else if (str.contains("Japan|2012")) {
    return 2;
   } else if (str.contains("Japan|2011")) {
    return 3;
   } else
    return 4;
  }

First condition have very huge amount of data like 20 GB but last will have 12 mb .

Re: How to set No of output file per reducer in Custom Partitioner Hadoop

aervits — Wed, 21 Dec 2016 20:57:34 GMT

Generally to control output format from reducer you'd use multiple output class https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

You get best results by writing larger files, not sure what you benefit from splitting a dataset that fits your criteria into smaller chunks, job won't complete until all of the criteria is addressed and in fact I think you'll hurt performance by splitting what is by design a better approach.

Re: How to set No of output file per reducer in Custom Partitioner Hadoop

aervits — Wed, 21 Dec 2016 21:06:13 GMT

You can try your use case using Pig and built-in Split function as you'll benefit from underlying query plan optimizations and Tez execution engine compared to pure mapreduce implementation

http://pig.apache.org/docs/r0.16.0/basic.html#SPLIT

It might be a much more worthwhile investment in your case

Re: How to set No of output file per reducer in Custom Partitioner Hadoop

aervits — Wed, 21 Dec 2016 21:17:02 GMT

To drive my point home here's more http://blog.mortardata.com/post/60274287605/pig-vs-mapreduce

And http://blog.mortardata.com/post/33711299619/8-reasons-you-should-be-using-apache-pig

Re: How to set No of output file per reducer in Custom Partitioner Hadoop

aervits — Fri, 23 Dec 2016 09:33:56 GMT

Was I able to answer your question or do you need further clarification?

Re: How to set No of output file per reducer in Custom Partitioner Hadoop

aervits — Fri, 30 Dec 2016 01:23:25 GMT

@sudarshan kumar did that answer your question?

Re: How to set No of output file per reducer in Custom Partitioner Hadoop

sudarshankumar_ — Fri, 31 Mar 2017 12:51:03 GMT

No i can not go for PIG now my full application is developed on mapreduce .

Re: How to set No of output file per reducer in Custom Partitioner Hadoop

sudarshankumar_ — Fri, 31 Mar 2017 12:52:12 GMT

Customer needs data in the proper file .Even if one file will have 10 kb data also .