Lazy output format won't create any empty files in HDFS directories i.e when we are running MapReduce job, output files(part-nnnn) files are created by the reducer and the output from reducer will be zero, 1 or more records.
If there are no record/s for the specific partition and you are using Lazy Output format in driver class then we are not going to create any empty files in HDFS directories.
If we won't use Lazy Output format then all the empty files are going to created in HDFS directories,while reading the data from the directories which will cause performance impact on the jobs.
As a bottom line if you want to suppress the empty files creation and the output files are created only when first record is generated by the partition.
If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
FileOutputFormat subclasses will create output files (part-r-nnnn), even if they are empty. Some applications prefer not to create empty files, which is where LazyOutputFormat comes into picture.
LazyOutputFormat is a wrapper OutputFormat. It make sure that the output file should create only when it emit its first record for a given partition.
To use LazyOutputFormat, call its SetOutputFormatClass() method with the JobConf. To enable LazyOutputFormat, streaming and pipes supports a – lazyOutput option
Recordwriter writes the output the reducer phase (output can be zero 1 or more key value pairs. ) to the output files output format determines how recordwrite writes these key value pairs in output files files output format subclass will create output files , even if they empty but some application prefer not to create empty files , lazy output format used in this scenario .