Support Questions

Find answers, ask questions, and share your expertise

How will we create single mapper for small files?

How can I create single mapper for small files


mall file problems- suppose we have 10 small size file in HDFS there would be require 10 mappper to run.suppose we have thousand of small file
file 1000 of mapper would require to run this will degrade the performance.istead of thosand of mapper there would require to run one Mapper.
this reduces the performance.

To overcome this large no. of small file problems, Hadoop provides an abstract class - CombineFileInputFormat.
CombineFileInputFormat packs many files into a single split.

Nowsingle mapper can used for processing multiple small files