Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
reduceByKey versus reduceByKey2
Labels:
- Labels:
-
MapReduce
New Contributor
Created 03-01-2023 09:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assume two versions for ReduceByKey
def reduceByKey(f, kvs, acc):
s = shuffle(kvs)
return map(lambda p: (p[0], reduce(f,p[1],acc)), s)
which works as:
reduceByKey(lambda x,y:x+y, [("k1",1),("k2",1), ("k1",2), ("k2",3)],0)
yields
[('k1', 3), ('k2', 4)]
and
def reduceByKey2(agg, kvs):
return map(agg, shuffle(kvs))
with
reduceByKey2(lam
bda kvs:(kvs[0], sum(kvs[1])), [("k1",1),("k2",1), ("k1",2), ("k2",3)])
which yields
[('k1', 3), ('k2', 4)]
and shuffle:
def merge(kvls1, kvls2):
if len(kvls1) == 0: return kvls2
elif len(kvls2) == 0: return kvls1
else:
((k1,vl1), tl1) = (kvls1[0], kvls1[1:])
((k2,vl2), tl2) = (kvls2[0], kvls2[1:])
if k1 == k2: return [(k1,vl1+vl2)]+merge(tl1,tl2)
elif k1 < k2: return [(k1,vl1)]+merge(tl1,kvls2)
else: return [(k2,vl2)]+merge(kvls1, tl2)
def shuffle(kvs):
kvls = map(lambda kv: [(kv[0], [kv[1]])], kvs)
return reduce(merge, kvls, [])
I am wondering about the key limitations with reduceByKey2?
Also, what would be an operation that can be defined with reduceByKey2, but not by reduceByKey? I can only think of aggregations methods like median, but I think there are more.
1 REPLY 1
Community Manager
Created on 03-02-2023 12:57 AM - edited 03-02-2023 12:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@pentational Welcome to the Cloudera Community! I noticed that your post may be related to Spark based on some keywords used. To help you get the best possible solution, I have tagged our Spark experts @RangaReddy and @Babasaheb who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
