Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is LazyOutputFormat in Hadoop?

Highlighted

What is LazyOutputFormat in Hadoop?

New Contributor

Does somebody knows exactly about Lazy Output format in Hadoop?

7 REPLIES 7

Re: What is LazyOutputFormat in Hadoop?

Super Guru
@Harshali Patel

Lazy output format won't create any empty files in HDFS directories i.e when we are running MapReduce job, output files(part-nnnn) files are created by the reducer and the output from reducer will be zero, 1 or more records.

If there are no record/s for the specific partition and you are using Lazy Output format in driver class then we are not going to create any empty files in HDFS directories.

If we won't use Lazy Output format then all the empty files are going to created in HDFS directories,while reading the data from the directories which will cause performance impact on the jobs.

As a bottom line if you want to suppress the empty files creation and the output files are created only when first record is generated by the partition.

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Re: What is LazyOutputFormat in Hadoop?

Super Guru
@Shantanu kumar

Please refer to this link for actual implementation of LazyOutputformat.

Re: What is LazyOutputFormat in Hadoop?

Super Guru
@Shantanu kumar

If you are running Hive joins/queries to populate some tables/directories, then hive doesn't create empty files in HDFS directories.

Re: What is LazyOutputFormat in Hadoop?

New Contributor

Can you please explain how we can implement this concept?

Regards,

Shantanu.

Re: What is LazyOutputFormat in Hadoop?

New Contributor

FileOutputFormat subclasses will create output files (part-r-nnnn), even if they are empty. Some applications prefer not to create empty files, which is where LazyOutputFormat comes into picture.
LazyOutputFormat is a wrapper OutputFormat. It make sure that the output file should create only when it emit its first record for a given partition.
To use LazyOutputFormat, call its SetOutputFormatClass() method with the JobConf. To enable LazyOutputFormat, streaming and pipes supports a – lazyOutput option

Re: What is LazyOutputFormat in Hadoop?

New Contributor

Thanks for the reply @shu and @Dukool SHarma .

I am not writing any Map and reduce job. On the query level i.e in hive query can we call this LazyOutputFormat?. My table formate is ORC.

Regards,

Shantanu

Re: What is LazyOutputFormat in Hadoop?

New Contributor

Recordwriter writes the output the reducer phase (output can be zero 1 or more key value pairs. ) to the output files output format determines how recordwrite writes these key value pairs in output files files output format subclass will create output files , even if they empty but some application prefer not to create empty files , lazy output format used in this scenario .