Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how can be groupBy() method changed in case we know the selected column of data frame is constructed with limited number of values?

Highlighted

how can be groupBy() method changed in case we know the selected column of data frame is constructed with limited number of values?

I wanted to apply groupBy() method on Dstream in which each RDD is a dataframe . I know some of the selected columns contain values which belong to predefined list . Is there any way to mention this issue in implementation of groupBy in order to decrease the running time of whole of application? Or in other words, Is it possible to make the groupBy more efficient in this situation?

1 REPLY 1

Re: how can be groupBy() method changed in case we know the selected column of data frame is constructed with limited number of values?

Rising Star

I don't think it's feasible. Knowing the set of possible values can't give you any advantage on the group by computation...

Don't have an account?
Coming from Hortonworks? Activate your account here