- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What is LazyOutputFormat in Hadoop?
- Labels:
-
Apache Hadoop
Created ‎05-18-2018 08:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does somebody knows exactly about Lazy Output format in Hadoop?
Created ‎05-18-2018 12:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lazy output format won't create any empty files in HDFS directories i.e when we are running MapReduce job, output files(part-nnnn) files are created by the reducer and the output from reducer will be zero, 1 or more records.
If there are no record/s for the specific partition and you are using Lazy Output format in driver class then we are not going to create any empty files in HDFS directories.
If we won't use Lazy Output format then all the empty files are going to created in HDFS directories,while reading the data from the directories which will cause performance impact on the jobs.
As a bottom line if you want to suppress the empty files creation and the output files are created only when first record is generated by the partition.
-
If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created ‎05-18-2018 04:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please refer to this link for actual implementation of LazyOutputformat.
Created ‎05-23-2018 11:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are running Hive joins/queries to populate some tables/directories, then hive doesn't create empty files in HDFS directories.
Created ‎05-18-2018 03:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please explain how we can implement this concept?
Regards,
Shantanu.
Created ‎05-19-2018 09:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FileOutputFormat subclasses will create output files (part-r-nnnn), even if they are empty. Some applications prefer not to create empty files, which is where LazyOutputFormat comes into picture.
LazyOutputFormat is a wrapper OutputFormat. It make sure that the output file should create only when it emit its first record for a given partition.
To use LazyOutputFormat, call its SetOutputFormatClass() method with the JobConf. To enable LazyOutputFormat, streaming and pipes supports a – lazyOutput option
Created ‎05-22-2018 08:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply @shu and @Dukool SHarma .
I am not writing any Map and reduce job. On the query level i.e in hive query can we call this LazyOutputFormat?. My table formate is ORC.
Regards,
Shantanu
Created ‎12-08-2018 05:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Recordwriter writes the output the reducer phase (output can be zero 1 or more key value pairs. ) to the output files output format determines how recordwrite writes these key value pairs in output files files output format subclass will create output files , even if they empty but some application prefer not to create empty files , lazy output format used in this scenario .
