Created 05-18-2018 08:53 AM
Does somebody knows exactly about Lazy Output format in Hadoop?
Created 05-18-2018 12:41 PM
Lazy output format won't create any empty files in HDFS directories i.e when we are running MapReduce job, output files(part-nnnn) files are created by the reducer and the output from reducer will be zero, 1 or more records.
If there are no record/s for the specific partition and you are using Lazy Output format in driver class then we are not going to create any empty files in HDFS directories.
If we won't use Lazy Output format then all the empty files are going to created in HDFS directories,while reading the data from the directories which will cause performance impact on the jobs.
As a bottom line if you want to suppress the empty files creation and the output files are created only when first record is generated by the partition.
-
If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created 05-18-2018 04:03 PM
Please refer to this link for actual implementation of LazyOutputformat.
Created 05-23-2018 11:03 AM
If you are running Hive joins/queries to populate some tables/directories, then hive doesn't create empty files in HDFS directories.
Created 05-18-2018 03:49 PM
Can you please explain how we can implement this concept?
Regards,
Shantanu.
Created 05-19-2018 09:36 AM
FileOutputFormat subclasses will create output files (part-r-nnnn), even if they are empty. Some applications prefer not to create empty files, which is where LazyOutputFormat comes into picture.
LazyOutputFormat is a wrapper OutputFormat. It make sure that the output file should create only when it emit its first record for a given partition.
To use LazyOutputFormat, call its SetOutputFormatClass() method with the JobConf. To enable LazyOutputFormat, streaming and pipes supports a – lazyOutput option
Created 05-22-2018 08:06 PM
Thanks for the reply @shu and @Dukool SHarma .
I am not writing any Map and reduce job. On the query level i.e in hive query can we call this LazyOutputFormat?. My table formate is ORC.
Regards,
Shantanu
Created 12-08-2018 05:02 PM
Recordwriter writes the output the reducer phase (output can be zero 1 or more key value pairs. ) to the output files output format determines how recordwrite writes these key value pairs in output files files output format subclass will create output files , even if they empty but some application prefer not to create empty files , lazy output format used in this scenario .