Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

reduceByKey versus reduceByKey2

avatar
New Contributor

Assume two versions for ReduceByKey

 

def reduceByKey(f, kvs, acc):
   s = shuffle(kvs)
   return map(lambda p: (p[0], reduce(f,p[1],acc)), s)
 
which works as: 
 
reduceByKey(lambda x,y:x+y, [("k1",1),("k2",1), ("k1",2), ("k2",3)],0) 
yields
[('k1', 3), ('k2', 4)]
 
 
and 
def reduceByKey2(agg, kvs):
   return map(agg, shuffle(kvs))
 
with 
reduceByKey2(lam
 
bda kvs:(kvs[0], sum(kvs[1])), [("k1",1),("k2",1), ("k1",2), ("k2",3)])
which yields
[('k1', 3), ('k2', 4)]
 
 
and shuffle: 
 
def merge(kvls1, kvls2):
   if len(kvls1) == 0: return kvls2
   elif len(kvls2) == 0: return kvls1
   else:
      ((k1,vl1), tl1) = (kvls1[0], kvls1[1:])
      ((k2,vl2), tl2) = (kvls2[0], kvls2[1:])
      if k1 == k2: return [(k1,vl1+vl2)]+merge(tl1,tl2)
      elif k1 < k2: return [(k1,vl1)]+merge(tl1,kvls2)
      else: return [(k2,vl2)]+merge(kvls1, tl2)

def shuffle(kvs):
   kvls = map(lambda kv: [(kv[0], [kv[1]])], kvs)
   return reduce(merge, kvls, [])
 
 
I am wondering about the key limitations with reduceByKey2? 
 
Also, what would be an operation that can be defined with reduceByKey2, but not by reduceByKey? I can only think of aggregations methods like median, but I think there are more. 
 
1 REPLY 1

avatar
Community Manager

@pentational Welcome to the Cloudera Community! I noticed that your post may be related to Spark based on some keywords used. To help you get the best possible solution, I have tagged our Spark experts @RangaReddy  and @Babasaheb  who may be able to assist you further.

 

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.

 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: