- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to get the count of last key value pair in mapreduce wordcount programme
- Labels:
-
MapReduce
Created on ‎09-12-2015 11:29 PM - edited ‎09-16-2022 02:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i,
I have been trying to do a word count programme, which emmits only 1 key value , ie the last key value pair in the input file using wordcount mapreduce programme.
Here is the content of the input file in a directory :
a.txt :
====
f f g h
i i j k
l l m r
f f h h
Content of b.txt
========
r r g h
h h m m
c c b b
d d r f
O/p should be :
r 4
Here is my sample mapper code & reducer code for simple word count. Can anyone tell me what changes should I make to get th o/p like above :
Mapper code:
--------------------
public class WcMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
private static final IntWritable one= new IntWritable(1);
private final Text word=new Text();
public void map(LongWritable key,Text value, Context context
) throws IOException, InterruptedException
{
StringTokenizer st =new StringTokenizer(value.toString());
while(st.hasMoreTokens()){
word.set(st.nextToken());
context.write(word, one);
}
}
}
Reducer code :
---------------------
public void reduce(Text key,Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException{
int sum=0;
for(IntWritable value:values){
sum+= value.get();
}
context.write(key, new IntWritable(sum));
}
}
Driver code::
-----------------------
public class WcDriver extends Configured implements Tool{
public static void main(String[] args) throws Exception {
int status = ToolRunner.run(new WcDriver(), args);
System.exit(status);
}
@Override
public int run(String[] args) throws Exception {
Configuration c1=new Configuration();
Job j1= new Job(c1,"woc");
j1.setJarByClass(WcDriver.class);
j1.setMapperClass(WcMapper.class);
j1.setReducerClass(WcReducer.class);
j1.setInputFormatClass(TextInputFormat.class);
j1.setOutputFormatClass(TextOutputFormat.class);
j1.setOutputKeyClass(Text.class);
j1.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j1, new Path(args[0]));
FileOutputFormat.setOutputPath(j1, new Path(args[1]));
FileSystem fs = FileSystem.newInstance(c1);
if (fs.exists(new Path(args[1]))) {
fs.delete(new Path(args[1]), true);
}
return j1.waitForCompletion(true) ? 0 : 1;
}
}
Appreciate all help.Please help....
Created ‎09-20-2015 06:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What you are looking to do is only emit the largest (by value) key out, i.e. a MAX(…) behaviour in SQL for example.
This is simple to perform:
1. In the Mapper's setup call, initialise a zero-valued string (lowest ascii value) as the base key, along with a zeroed counter.
2. Across all map(…) calls keep track of if the current probable key is greater than the previous encountered key (beginning with the base key set above). Don't emit anything just yet - just keep reassigning the base key if its greater than the existing one (and reset the counter to 1). If its found equal, increment its counter.
3. In the cleanup(…) method, emit just the base key.
4. Given a MAX-like operation, configure a single reducer, and perform the very same max-tracking/final-emit within the setup(…), reduce(…) and cleanup(…) of the Reducer implementation, but take care to do the count aggregations before the compare, so you get the real count.
Created ‎09-18-2015 04:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The word "last key value pair" doesn't quite make sense to me. Please elaborate?
Created ‎09-20-2015 12:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI,
Normally as per the i/p I mentioned we should get the o/p as
f 4
g 2
h 6
...
...
r 4
But I need only the o/p as last key & its sum..ie 'r ' & its sum as 4.
How can we achieve this , anyway can we get only last key & its count as o/p.?
Created ‎09-20-2015 06:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What you are looking to do is only emit the largest (by value) key out, i.e. a MAX(…) behaviour in SQL for example.
This is simple to perform:
1. In the Mapper's setup call, initialise a zero-valued string (lowest ascii value) as the base key, along with a zeroed counter.
2. Across all map(…) calls keep track of if the current probable key is greater than the previous encountered key (beginning with the base key set above). Don't emit anything just yet - just keep reassigning the base key if its greater than the existing one (and reset the counter to 1). If its found equal, increment its counter.
3. In the cleanup(…) method, emit just the base key.
4. Given a MAX-like operation, configure a single reducer, and perform the very same max-tracking/final-emit within the setup(…), reduce(…) and cleanup(…) of the Reducer implementation, but take care to do the count aggregations before the compare, so you get the real count.
Created ‎09-20-2015 06:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for your reply.
I have another doubt to ask you, how can we determine the no of mappers in the above mentioned wordcount programme. Can we determine that only using those 2 input files a.txt & b.txt ??. Is it mandatory that we should know the file size & block size?
Please help...
