Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

New Contributor

Hi,

I am running one MR job on EMR. Input files are ~20000 and each has around 60 MB size.Number of mappers are equal to number of files.

I have set below two parameters.

	fs.s3n.block.size=1073741824
	mapred.min.split.size=1073741824

How can I reduce number of mappers? What parameters to be set?

Thanks

Shubham

3 REPLIES 3

Re: Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

Depends on your job. MapReduce can only merge files if you use a CombineFileInputFormat. We wrote a good explanation once.

http://www.ibm.com/developerworks/library/ba-mapreduce-biginsights-analysis/

If you use pig or hive you can set the input file size with parameters since they already do this. But a standard MapReduce job? You need to implement the class above.

Re: Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

New Contributor

Thanks a lot. Will try that.

Re: Numbers of Mappers are equal to numbers of files(Each File Size ~ 60MB ) on S3

Super Guru
@shubham chhabra

Number of mappers gets decided dynamically by input split. if you have input file size of 500mb and your input split is 256mb then there will be 2 mappers. In your case you have input files of size 60MB and your input split is 1GB hence 1 mapper for 1 input file.

Check out below links

http://inquidia.com/news-and-info/working-small-files-hadoop-part-1

http://inquidia.com/news-and-info/working-small-files-hadoop-part-2

Hope this information helps! :)

Don't have an account?
Coming from Hortonworks? Activate your account here