Member since
03-06-2014
1
Post
0
Kudos Received
0
Solutions
03-06-2014
04:53 PM
Greetings, I was wondering if we can combine multiple "calculations" within a single job that operate on the same data. For example lets go back to the basic wordcount. Given a large document, along with a) the total number of words, we want to count b) the total number of sentences and c) the total number of paragraphs in it. All three tasks operate on the same data, and differ only on the delimiter of the tokenization. Is it valid to combine all three within the same job in the same mapreduce program, or do we have to write three individual programs and run each on the whole dataset? So to generalize: can we "combine" different calculations on the same data? What if one of the calculations needed to emit different <key, value> data types (both on the mapper and the reducer)? What are the pros and cons of such an approach? Is it safe? Will it be faster than running three jobs separately? Can the reducer emit different output files for each calculation? What is the best implementation for the whole approach? Thanks in advance!
... View more
Labels:
- Labels:
-
MapReduce