<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: how to suppress mapper output files if the output file does not have any data? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/29541#M6567</link>
    <description>&amp;gt;From Hadoop: The Definite Guide (Tom White):&lt;BR /&gt;&lt;BR /&gt;"""&lt;BR /&gt;About LazyOutputFormat&lt;BR /&gt;-----------------------&lt;BR /&gt;A typical mapreduce program can produce output files that are empty,&lt;BR /&gt;depending on your implemetation.&lt;BR /&gt;If you want to suppress creation of empty files, you need to leverage&lt;BR /&gt;LazyOutputFormat.&lt;BR /&gt;Two lines in your driver will do the trick-&lt;BR /&gt;import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat;&lt;BR /&gt;&amp;amp;&lt;BR /&gt;LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);&lt;BR /&gt;"""&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Tue, 14 Jul 2015 06:34:55 GMT</pubDate>
    <dc:creator>Harsh J</dc:creator>
    <dc:date>2015-07-14T06:34:55Z</dc:date>
    <item>
      <title>how to suppress mapper output files if the output file does not have any data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/29540#M6566</link>
      <description>&lt;DIV class="bbp-topic-content"&gt;&lt;P&gt;Apologies if i haven’t put the question properly.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a combined file format, which returns file name as key and filecontent as value.&lt;/P&gt;&lt;P&gt;I customized Mapper class’s run method and runs the map method if the file meets specific conditions only.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;lets say, it calls map method if the file content is greater than 200 kb .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If 200 files are sent as input, 200 mappers will commence, and if only 100 files met the criteria and ran map method, we will still have 200 output files in output folder.&lt;/P&gt;&lt;P&gt;Is there a way, to make sure to ensure no output file should be there if the file does not have any data.? or other way around, to create files only if the data is there for files?&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 14 Jul 2015 05:55:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/29540#M6566</guid>
      <dc:creator>Srini_D</dc:creator>
      <dc:date>2015-07-14T05:55:50Z</dc:date>
    </item>
    <item>
      <title>Re: how to suppress mapper output files if the output file does not have any data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/29541#M6567</link>
      <description>&amp;gt;From Hadoop: The Definite Guide (Tom White):&lt;BR /&gt;&lt;BR /&gt;"""&lt;BR /&gt;About LazyOutputFormat&lt;BR /&gt;-----------------------&lt;BR /&gt;A typical mapreduce program can produce output files that are empty,&lt;BR /&gt;depending on your implemetation.&lt;BR /&gt;If you want to suppress creation of empty files, you need to leverage&lt;BR /&gt;LazyOutputFormat.&lt;BR /&gt;Two lines in your driver will do the trick-&lt;BR /&gt;import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat;&lt;BR /&gt;&amp;amp;&lt;BR /&gt;LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);&lt;BR /&gt;"""&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 14 Jul 2015 06:34:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/29541#M6567</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2015-07-14T06:34:55Z</dc:date>
    </item>
    <item>
      <title>Re: how to suppress mapper output files if the output file does not have any data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/31710#M6568</link>
      <description>&lt;P&gt;Hello &amp;nbsp;Harsh,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried the same &amp;nbsp;for &amp;nbsp;AvroMultipleOut files and &amp;nbsp;this still generates &amp;nbsp; empty avro files.Should something in addition be done &amp;nbsp;when we are using Avro MultipleOutputs?I am using avro 1.7.7 and CDH 5.4.Please let me know if you have faced &amp;nbsp;this issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Nishanth&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Sep 2015 22:19:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/31710#M6568</guid>
      <dc:creator>Nishan</dc:creator>
      <dc:date>2015-09-10T22:19:22Z</dc:date>
    </item>
    <item>
      <title>Re: how to suppress mapper output files if the output file does not have any data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/34350#M6569</link>
      <description>The issue in my case was I was not closing the avromultipleoutputs instance in the mapper.Combination of lazyoutputformat and closing the avromultipleoutputs instance in the mapper fixed the issue for me.</description>
      <pubDate>Mon, 23 Nov 2015 22:49:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/34350#M6569</guid>
      <dc:creator>Nishan</dc:creator>
      <dc:date>2015-11-23T22:49:43Z</dc:date>
    </item>
    <item>
      <title>Re: how to suppress mapper output files if the output file does not have any data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/42683#M6570</link>
      <description>&lt;P&gt;Hello Harsh,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you please suggest the solution also for Old Mapred API code since my code generates the empty part-xxxx files if the mapper conditions are not met and because of which the reducer throws exceptions when it reaches 80%.. So need to suppress writing the empty part-xxxx files in mapper stage itself. your inputs would be highly helpful. Thanks in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;BR//Hareeharan&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2016 01:11:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/42683#M6570</guid>
      <dc:creator>hareeharand</dc:creator>
      <dc:date>2016-07-08T01:11:41Z</dc:date>
    </item>
    <item>
      <title>Re: how to suppress mapper output files if the output file does not have any data?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/42685#M6571</link>
      <description>LazyOutputFormat is available for both APIs. Here's the one for the older API: &lt;A href="http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/mapred/lib/LazyOutputFormat.html" target="_blank"&gt;http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/mapred/lib/LazyOutputFormat.html&lt;/A&gt;</description>
      <pubDate>Fri, 08 Jul 2016 02:31:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-suppress-mapper-output-files-if-the-output-file-does/m-p/42685#M6571</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2016-07-08T02:31:45Z</dc:date>
    </item>
  </channel>
</rss>

