Archives of Support Questions (Read Only)

heta_desai · ‎03-22-2017

i want to know internal representation of YARN and MAPREDUCE. i am new to hadoop. i am getting how exactly the jobs get executed.

ssathish · ‎03-22-2017

Hi @heta desai

The Application Master will launch one MapTask for each map split. Typically, there is a map split for each input file. If the input file is too big (bigger than the HDFS block size) then we have two or more map splits associated to the same input file.

Also the memory used fro map and reduce task is RAM of Nodemanagers.

Please refer to it for more details -

http://ercoppa.github.io/HadoopInternals/AnatomyMapReduceJob.html

View solution in original post

ssathish · ‎03-22-2017

Hi @heta desai

The Application Master will launch one MapTask for each map split. Typically, there is a map split for each input file. If the input file is too big (bigger than the HDFS block size) then we have two or more map splits associated to the same input file.

Also the memory used fro map and reduce task is RAM of Nodemanagers.

Please refer to it for more details -

http://ercoppa.github.io/HadoopInternals/AnatomyMapReduceJob.html

heta_desai · ‎03-23-2017

Thank you.

ssathish · ‎03-23-2017

Can you please accept my answer if it answered your question ? 🙂

Cloudera Community

Archives of Support Questions (Read Only)

How AM decides how many mapreduce jobs wiill be created for particular execution ? and in which memory the map task and reduce task will be performed ?