Support Questions

Find answers, ask questions, and share your expertise

map reduce program to count Frequency of particular entity in a column of a table

We have a marketing report in form of tabular data set which schema looks like this:

Page DataSample Data
Page DataSample Data
Request Information DataSample Data
Page Viewed1472688087204231ww.123.comSample DataSample Data
Page DataSample Data
Page DataSample Data
Page DataSample Data
Page Viewed147268831061410www.tuv.inSample DataSample Data

We need to write a map reduce program in order to find out the highest frequency of Initial_referring source site in order to find out which website most effective ad platform.


  1. Remove rows having duplicate entities in distinct_id column.
  2. Count frequency of each entity in initial_referring column.
  3. Publish the result of frequency of each identity.

I am able to do this problem in Hive and pig but was not able to get the correct result in MapReduce program.

Any reference or piece of similar code can help.


@Praveen Singh

Assuming that when you say "tabular", your file is comma/tab/pipe/etc.. delimited. A simple word-count program should suffice.

A nice posting with ways to achieve this using any of Hive, Pig, R, Spark, MapReduce (java), MapReduce(Python) may be found in the below link. The page formatting is not great, but the content is informative

As always, if you find this post useful, don't forget to "accept" the answer.