Support Questions
Find answers, ask questions, and share your expertise

Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

Explorer

Hi,

I am running one MR job on EMR. Input files are ~20000 and each has around 60 MB size.Number of mappers are equal to number of files.

I have set below two parameters.

	fs.s3n.block.size=1073741824
	mapred.min.split.size=1073741824

How can I reduce number of mappers? What parameters to be set?

Thanks

Shubham

3 REPLIES 3

Re: Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

Depends on your job. MapReduce can only merge files if you use a CombineFileInputFormat. We wrote a good explanation once.

http://www.ibm.com/developerworks/library/ba-mapreduce-biginsights-analysis/

If you use pig or hive you can set the input file size with parameters since they already do this. But a standard MapReduce job? You need to implement the class above.

Re: Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

Explorer

Thanks a lot. Will try that.

Re: Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

Super Guru
@shubham chhabra

Number of mappers gets decided dynamically by input split. if you have input file size of 500mb and your input split is 256mb then there will be 2 mappers. In your case you have input files of size 60MB and your input split is 1GB hence 1 mapper for 1 input file.

Check out below links

http://inquidia.com/news-and-info/working-small-files-hadoop-part-1

http://inquidia.com/news-and-info/working-small-files-hadoop-part-2

Hope this information helps! :)