- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Process multiple dataframes in parallel using pyspark 2.1
- Labels:
-
Apache Spark
Created ‎05-29-2018 02:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I am struggling to find suitable APIs to process multiple data frames in parallel. My requirement is the following-
I have 10s of distinct spark data frames. A certain set of operations must be performed on each DF ( treating each as a single partition), and some results must be returned from each processing. Ex:
Apply func1, func2, func3 to DF1, DF2 and DF3, return list1, list2 and list3 from each.
So, in theory, func1, func2 and func3 can be run in parallel. Wondering if there is any pyspark pattern I can follow.
Thanks !
Created ‎05-29-2018 07:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No responses ? Does that mean it is not possible or there is something very obvious that I am missing 🙂
Created ‎05-30-2018 02:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is not simple OOB solution to this AFAIK. Have you considered using multiple threads on driver side to do this? In the following threads they discuss using Future to do just that, perhaps this could help you.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created ‎05-30-2018 06:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Developer Developer As @Felix Albani suggested above i'd go with spawning multiple threads to process the dataframes in parallel. This article has a good example : https://hadoopist.wordpress.com/2017/02/03/how-to-use-threads-in-spark-job-to-achieve-parallel-read-...
