<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to set No of output file per reducer in Custom Partitioner Hadoop in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166301#M49693</link>
    <description>&lt;P&gt;To drive my point home here's more &lt;A href="http://blog.mortardata.com/post/60274287605/pig-vs-mapreduce"&gt;http://blog.mortardata.com/post/60274287605/pig-vs-mapreduce&lt;/A&gt;&lt;/P&gt;&lt;P&gt;And &lt;A href="http://blog.mortardata.com/post/33711299619/8-reasons-you-should-be-using-apache-pig" target="_blank"&gt;http://blog.mortardata.com/post/33711299619/8-reasons-you-should-be-using-apache-pig&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 21 Dec 2016 21:17:02 GMT</pubDate>
    <dc:creator>aervits</dc:creator>
    <dc:date>2016-12-21T21:17:02Z</dc:date>
    <item>
      <title>How to set No of output file per reducer in Custom Partitioner Hadoop</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166298#M49690</link>
      <description>&lt;P&gt;Hi ,&lt;/P&gt;&lt;P&gt;I have implemented custom partition based on my logic .and i am able to get files also properly .But because of the condition some of the reducer is having very huge data and that leads to delay in reducer phase.&lt;/P&gt;&lt;P&gt;So is there any way so that i can create many small files inside one reducer output file .&lt;/P&gt;&lt;P&gt;Here is my custom partioner &lt;/P&gt;&lt;PRE&gt;public class MyPartioner extends Partitioner&amp;lt;Text, IntWritable&amp;gt; {
  public int getPartition(Text key, IntWritable value, int setNumRedTask) {
   String str = key.toString();
   if (str.contains("Japan|2014")) {
    return 0;
   } else if (str.contains("Japan|2013")) {
    return 1;
   }  else if (str.contains("Japan|2012")) {
    return 2;
   } else if (str.contains("Japan|2011")) {
    return 3;
   } else
    return 4;
  }&lt;/PRE&gt;&lt;P&gt;First condition have very huge amount of data like 20 GB but last will have 12 mb .&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 18:50:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166298#M49690</guid>
      <dc:creator>sudarshankumar_</dc:creator>
      <dc:date>2016-12-21T18:50:07Z</dc:date>
    </item>
    <item>
      <title>Re: How to set No of output file per reducer in Custom Partitioner Hadoop</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166299#M49691</link>
      <description>&lt;P&gt;Generally to control output format from reducer you'd use multiple output  class &lt;A href="https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html" target="_blank"&gt;https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You get best results by writing larger files, not sure what you benefit from splitting a dataset that fits your criteria into smaller chunks, job won't complete until all of the criteria is addressed and in fact I think you'll hurt performance by splitting what is by design a better approach.&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 20:57:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166299#M49691</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-12-21T20:57:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to set No of output file per reducer in Custom Partitioner Hadoop</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166300#M49692</link>
      <description>&lt;P&gt;You can try your use case using Pig and built-in Split function as you'll benefit from underlying query plan optimizations and Tez execution engine compared to pure mapreduce implementation &lt;/P&gt;&lt;P&gt;&lt;A href="http://pig.apache.org/docs/r0.16.0/basic.html#SPLIT"&gt;http://pig.apache.org/docs/r0.16.0/basic.html#SPLIT&lt;/A&gt;&lt;/P&gt;&lt;P&gt;It might be a much more worthwhile investment in your case&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 21:06:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166300#M49692</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-12-21T21:06:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to set No of output file per reducer in Custom Partitioner Hadoop</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166301#M49693</link>
      <description>&lt;P&gt;To drive my point home here's more &lt;A href="http://blog.mortardata.com/post/60274287605/pig-vs-mapreduce"&gt;http://blog.mortardata.com/post/60274287605/pig-vs-mapreduce&lt;/A&gt;&lt;/P&gt;&lt;P&gt;And &lt;A href="http://blog.mortardata.com/post/33711299619/8-reasons-you-should-be-using-apache-pig" target="_blank"&gt;http://blog.mortardata.com/post/33711299619/8-reasons-you-should-be-using-apache-pig&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 21:17:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166301#M49693</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-12-21T21:17:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to set No of output file per reducer in Custom Partitioner Hadoop</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166302#M49694</link>
      <description>&lt;P&gt;Was I able to answer your question or do you need further clarification?&lt;/P&gt;</description>
      <pubDate>Fri, 23 Dec 2016 09:33:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166302#M49694</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-12-23T09:33:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to set No of output file per reducer in Custom Partitioner Hadoop</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166303#M49695</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/14288/sudarshankumar86.html" nodeid="14288"&gt;@sudarshan kumar&lt;/A&gt; did that answer your question?&lt;/P&gt;</description>
      <pubDate>Fri, 30 Dec 2016 01:23:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166303#M49695</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-12-30T01:23:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to set No of output file per reducer in Custom Partitioner Hadoop</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166304#M49696</link>
      <description>&lt;P&gt;No i can not go for PIG now my full application is developed on mapreduce . &lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 12:51:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166304#M49696</guid>
      <dc:creator>sudarshankumar_</dc:creator>
      <dc:date>2017-03-31T12:51:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to set No of output file per reducer in Custom Partitioner Hadoop</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166305#M49697</link>
      <description>&lt;P&gt;Customer needs data in the proper file .Even if one file will have 10 kb data also .&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 12:52:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-set-No-of-output-file-per-reducer-in-Custom/m-p/166305#M49697</guid>
      <dc:creator>sudarshankumar_</dc:creator>
      <dc:date>2017-03-31T12:52:12Z</dc:date>
    </item>
  </channel>
</rss>

