Support Questions
Find answers, ask questions, and share your expertise

SparkR (1.6.0) toRDD single thread

SparkR (1.6.0) toRDD single thread

New Contributor

When I execute SparkR:::toRDD, in order to convert a dataframe to rdd, it looks like one R process is executed on one datanode and all data is passed through the R process. Takes a long time. Is there a way to parallelize this operation?

1 REPLY 1

Re: SparkR (1.6.0) toRDD single thread

@Joe Trite

I think this transformation happens at driver and hence it is talking time.

Found : https://issues.apache.org/jira/browse/SPARK-8277