Support Questions

Find answers, ask questions, and share your expertise

Mapreduce program to display word having highest count in file

avatar
Explorer

Hi,

 

Can you help me on this ? I want to write map reduce program which will display a word which repeated highest time in file.

 

Any way to modify wordcount mapreduce program to display only single row as word, # of count

 

Thanks

Sach

1 ACCEPTED SOLUTION

avatar
Mentor
The simplest way to express this, beginning with a raw text file, would be to do the two steps below:

1. First, use MR to form a word count. Split each line into words and count them as 1 each in Mapper, and aggregate they counts by word as key in Reducer.
2. Second, use yet another subsequent MR job to read and invert the key to sort it in the opposite form, wherein the key is now the count and the value are the words that match the count. Run either with a TotalOrderPartitioner or a single reducer to get your final result.

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

it seems you still don't know the main concept of map and reduce. basically, your question is very easy.

 

maybe you know, every output from map will be sorted  by word automatically, so as the original output of course is not suit to you.

 

for example you have these original words like below:

 

"i  love love love you and  love love you"

 

then the output after map will be like this:

 

and   1

i        1

love   5

you    2

 

so after you get these words input in Reduce, you just save the number,  and compare them, after you get the max, then output.

 

   

 

 

avatar
New Contributor

that output comes after the reduce function not map

avatar
Mentor
The simplest way to express this, beginning with a raw text file, would be to do the two steps below:

1. First, use MR to form a word count. Split each line into words and count them as 1 each in Mapper, and aggregate they counts by word as key in Reducer.
2. Second, use yet another subsequent MR job to read and invert the key to sort it in the opposite form, wherein the key is now the count and the value are the words that match the count. Run either with a TotalOrderPartitioner or a single reducer to get your final result.