Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

map reduce program to count Frequency of particular entity in a column of a table

Highlighted

map reduce program to count Frequency of particular entity in a column of a table

New Contributor

We have a marketing report in form of tabular data set which schema looks like this:

eventtimestampdistinct_idinitial_referring_domainColumnXColumnY
Page Viewed1472688038489687www.abc.comSample DataSample Data
Page Viewed1472688052118805www.abc.comSample DataSample Data
Request Information Click1472688056192674www.abc.comSample DataSample Data
Page Viewed1472688087204231ww.123.comSample DataSample Data
Page Viewed147268816176081www.abc.comSample DataSample Data
Page Viewed1472688219186081www.abc.comSample DataSample Data
Page Viewed147268823683259www.google.co.inSample DataSample Data
Page Viewed147268831061410www.tuv.inSample DataSample Data

We need to write a map reduce program in order to find out the highest frequency of Initial_referring source site in order to find out which website most effective ad platform.

Approach

  1. Remove rows having duplicate entities in distinct_id column.
  2. Count frequency of each entity in initial_referring column.
  3. Publish the result of frequency of each identity.

I am able to do this problem in Hive and pig but was not able to get the correct result in MapReduce program.

Any reference or piece of similar code can help.

1 REPLY 1

Re: map reduce program to count Frequency of particular entity in a column of a table

@Praveen Singh

Assuming that when you say "tabular", your file is comma/tab/pipe/etc.. delimited. A simple word-count program should suffice.

A nice posting with ways to achieve this using any of Hive, Pig, R, Spark, MapReduce (java), MapReduce(Python) may be found in the below link. The page formatting is not great, but the content is informative

https://www.linkedin.com/pulse/word-count-program-using-r-spark-map-reduce-pig-hive-python-sahu

As always, if you find this post useful, don't forget to "accept" the answer.