Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

calculating median on grouped data

avatar
Explorer

Hello! I was trying to use spark to calculate median on grouped values in a dataframe, but have not had much success. I have tried using agg(), but median() is not available; tried to apply rank() to window function but the rank was not grouped; also tried to pivot the table to avoid the grouped step but the data frame is huge (8million rows) and it fails multiple times. Calculating median should be something straightforward to do since data analysts use it a lot. Maybe I'm missing something obvious? 

 

Thanks!!

Who agreed with this topic