question Re: Hdfs and map reduce word count? in Archives of Support Questions (Read Only)

Hdfs and map reduce word count?

KasperHansen — Fri, 16 Sep 2022 09:50:42 GMT

can anyone explain pr direct my to material explaining how map reduce word count works? I do not understand how it can work!

If a file is split into blocks and distributed over multiple nodes how can the word count program work? The file/text can be split in the middle of a word fx "be" in one block and "tween" in another block. How can the map reduce job count between as word if it is split over multiple blocks and nodes??

Re: Hdfs and map reduce word count?

Sean — Sun, 29 Nov 2015 19:47:56 GMT

The data format will ensure records are intact before being sent to the
mapper function. I believe this is done by sending partial records to the
machine they will be mapped on (so the overwhelming majority of the data is
processed in place, but half a line per block or so may still be
exchanged). Tom White's book The Definitive Guide To Hadoop does a good job
of covering details like this.

Re: Hdfs and map reduce word count?

KasperHansen — Sun, 29 Nov 2015 20:31:56 GMT

Ok very cool. Thank you for the reference.