Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark pandas split-apply-combine ?

spark pandas split-apply-combine ?

Rising Star

Hi, is there anything like pandas groupby (split-apply-combine) http://pandas.pydata.org/pandas-docs/stable/groupby.html.

 

I would like to spilt a big DataFrame into many small DataFrames according some columns.

On each group, I would like to apply a predefine function which will return another data object.

Finally, return a map of <key, result data object. Or if each group returns a DataFrame, I would like to combine them into a big DataFrame.

In short words, something like:

DataFrame.groupby(columns...).foreach(rows => ...)

or,

DataFrame.groupby(columns...).foreach(rows => ...).collect(...)

 

Thanks.

1 REPLY 1

Re: spark pandas split-apply-combine ?

Rising Star
Any idea?