Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark pandas split-apply-combine ?

spark pandas split-apply-combine ?

Rising Star

Hi, is there anything like pandas groupby (split-apply-combine)


I would like to spilt a big DataFrame into many small DataFrames according some columns.

On each group, I would like to apply a predefine function which will return another data object.

Finally, return a map of <key, result data object. Or if each group returns a DataFrame, I would like to combine them into a big DataFrame.

In short words, something like:

DataFrame.groupby(columns...).foreach(rows => ...)


DataFrame.groupby(columns...).foreach(rows => ...).collect(...)




Re: spark pandas split-apply-combine ?

Rising Star
Any idea?