Reply
Highlighted
Contributor
Posts: 52
Registered: ‎10-19-2016

spark pandas split-apply-combine ?

Hi, is there anything like pandas groupby (split-apply-combine) http://pandas.pydata.org/pandas-docs/stable/groupby.html.

 

I would like to spilt a big DataFrame into many small DataFrames according some columns.

On each group, I would like to apply a predefine function which will return another data object.

Finally, return a map of <key, result data object. Or if each group returns a DataFrame, I would like to combine them into a big DataFrame.

In short words, something like:

DataFrame.groupby(columns...).foreach(rows => ...)

or,

DataFrame.groupby(columns...).foreach(rows => ...).collect(...)

 

Thanks.

Contributor
Posts: 52
Registered: ‎10-19-2016

Re: spark pandas split-apply-combine ?

Any idea?