Support Questions

Find answers, ask questions, and share your expertise

Hdfs and map reduce word count?

avatar
New Contributor
can anyone explain pr direct my to material explaining how map reduce word count works? I do not understand how it can work!

If a file is split into blocks and distributed over multiple nodes how can the word count program work? The file/text can be split in the middle of a word fx "be" in one block and "tween" in another block. How can the map reduce job count between as word if it is split over multiple blocks and nodes??
1 ACCEPTED SOLUTION

avatar
Guru
The data format will ensure records are intact before being sent to the
mapper function. I believe this is done by sending partial records to the
machine they will be mapped on (so the overwhelming majority of the data is
processed in place, but half a line per block or so may still be
exchanged). Tom White's book The Definitive Guide To Hadoop does a good job
of covering details like this.

View solution in original post

2 REPLIES 2

avatar
Guru
The data format will ensure records are intact before being sent to the
mapper function. I believe this is done by sending partial records to the
machine they will be mapped on (so the overwhelming majority of the data is
processed in place, but half a line per block or so may still be
exchanged). Tom White's book The Definitive Guide To Hadoop does a good job
of covering details like this.

avatar
New Contributor
Ok very cool. Thank you for the reference.