Apologies if i haven’t put the question properly.
I have a combined file format, which returns file name as key and filecontent as value.
I customized Mapper class’s run method and runs the map method if the file meets specific conditions only.
lets say, it calls map method if the file content is greater than 200 kb .
If 200 files are sent as input, 200 mappers will commence, and if only 100 files met the criteria and ran map method, we will still have 200 output files in output folder.
Is there a way, to make sure to ensure no output file should be there if the file does not have any data.? or other way around, to create files only if the data is there for files?
I tried the same for AvroMultipleOut files and this still generates empty avro files.Should something in addition be done when we are using Avro MultipleOutputs?I am using avro 1.7.7 and CDH 5.4.Please let me know if you have faced this issue.
Can you please suggest the solution also for Old Mapred API code since my code generates the empty part-xxxx files if the mapper conditions are not met and because of which the reducer throws exceptions when it reaches 80%.. So need to suppress writing the empty part-xxxx files in mapper stage itself. your inputs would be highly helpful. Thanks in advance!