Support Questions

Sach · ‎11-06-2014

Hi,

Can you help me on this ? I want to write map reduce program which will display a word which repeated highest time in file.

Any way to modify wordcount mapreduce program to display only single row as word, # of count

Thanks

Sach

Harsh J · ‎11-30-2014

The simplest way to express this, beginning with a raw text file, would be to do the two steps below:

1. First, use MR to form a word count. Split each line into words and count them as 1 each in Mapper, and aggregate they counts by word as key in Reducer.
2. Second, use yet another subsequent MR job to read and invert the key to sort it in the opposite form, wherein the key is now the count and the value are the words that match the count. Run either with a TotalOrderPartitioner or a single reducer to get your final result.

View solution in original post

iamfromsky · ‎11-08-2014

it seems you still don't know the main concept of map and reduce. basically, your question is very easy.

maybe you know, every output from map will be sorted by word automatically, so as the original output of course is not suit to you.

for example you have these original words like below:

"i love love love you and love love you"

then the output after map will be like this:

and 1

i 1

love 5

you 2

so after you get these words input in Reduce, you just save the number, and compare them, after you get the max, then output.

silla · ‎02-22-2021

that output comes after the reduce function not map

Harsh J · ‎11-30-2014

The simplest way to express this, beginning with a raw text file, would be to do the two steps below:

1. First, use MR to form a word count. Split each line into words and count them as 1 each in Mapper, and aggregate they counts by word as key in Reducer.
2. Second, use yet another subsequent MR job to read and invert the key to sort it in the opposite form, wherein the key is now the count and the value are the words that match the count. Run either with a TotalOrderPartitioner or a single reducer to get your final result.

Cloudera Community

Support Questions

Mapreduce program to display word having highest count in file