Support Questions

Find answers, ask questions, and share your expertise

What is LazyOutputFormat in Hadoop?

avatar
Rising Star

Does somebody knows exactly about Lazy Output format in Hadoop?

7 REPLIES 7

avatar
Master Guru
@Harshali Patel

Lazy output format won't create any empty files in HDFS directories i.e when we are running MapReduce job, output files(part-nnnn) files are created by the reducer and the output from reducer will be zero, 1 or more records.

If there are no record/s for the specific partition and you are using Lazy Output format in driver class then we are not going to create any empty files in HDFS directories.

If we won't use Lazy Output format then all the empty files are going to created in HDFS directories,while reading the data from the directories which will cause performance impact on the jobs.

As a bottom line if you want to suppress the empty files creation and the output files are created only when first record is generated by the partition.

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

avatar
Master Guru
@Shantanu kumar

Please refer to this link for actual implementation of LazyOutputformat.

avatar
Master Guru
@Shantanu kumar

If you are running Hive joins/queries to populate some tables/directories, then hive doesn't create empty files in HDFS directories.

avatar

Can you please explain how we can implement this concept?

Regards,

Shantanu.

avatar
Rising Star

FileOutputFormat subclasses will create output files (part-r-nnnn), even if they are empty. Some applications prefer not to create empty files, which is where LazyOutputFormat comes into picture.
LazyOutputFormat is a wrapper OutputFormat. It make sure that the output file should create only when it emit its first record for a given partition.
To use LazyOutputFormat, call its SetOutputFormatClass() method with the JobConf. To enable LazyOutputFormat, streaming and pipes supports a – lazyOutput option

avatar

Thanks for the reply @shu and @Dukool SHarma .

I am not writing any Map and reduce job. On the query level i.e in hive query can we call this LazyOutputFormat?. My table formate is ORC.

Regards,

Shantanu

avatar
New Contributor

Recordwriter writes the output the reducer phase (output can be zero 1 or more key value pairs. ) to the output files output format determines how recordwrite writes these key value pairs in output files files output format subclass will create output files , even if they empty but some application prefer not to create empty files , lazy output format used in this scenario .