map reduce program to count Frequency of particular entity in a column of a table

We have a marketing report in form of tabular data set which schema looks like this:

We need to write a map reduce program in order to find out the highest frequency of Initial_referring source site in order to find out which website most effective ad platform.


  1. Remove rows having duplicate entities in distinct_id column.
  2. Count frequency of each entity in initial_referring column.
  3. Publish the result of frequency of each identity.

I am able to do this problem in Hive and pig but was not able to get the correct result in MapReduce program.

Any reference or piece of similar code can help.


@Praveen Singh

Assuming that when you say "tabular", your file is comma/tab/pipe/etc.. delimited. A simple word-count program should suffice.

A nice posting with ways to achieve this using any of Hive, Pig, R, Spark, MapReduce (java), MapReduce(Python) may be found in the below link. The page formatting is not great, but the content is informative

