I am running one MR job on EMR. Input files are ~20000 and each has around 60 MB size.Number of mappers are equal to number of files.
I have set below two parameters.
How can I reduce number of mappers? What parameters to be set?
Depends on your job. MapReduce can only merge files if you use a CombineFileInputFormat. We wrote a good explanation once.
If you use pig or hive you can set the input file size with parameters since they already do this. But a standard MapReduce job? You need to implement the class above.
Number of mappers gets decided dynamically by input split. if you have input file size of 500mb and your input split is 256mb then there will be 2 mappers. In your case you have input files of size 60MB and your input split is 1GB hence 1 mapper for 1 input file.
Check out below links
Hope this information helps! :)