Thanks for your course it was really useful.
I have a performance issue with my Spark implementation, so I decided to ask if someone can help please ?
I need to speedup a sequential genetic algorithm, so I ran it on Spark using Scala.
When I ran it, it was really slow. I'm wondering why it's slow ?
The genetic algorithm has a loop: the more it iterates the better result it will generate. But, on spark the more the loop iterates the more stages and jobs the loop will create. I think this is the main reason why it is slow, does anyone agree with me ?
Does anyone have any idea how can I speed it up ?
Here is an overview of how many transformations and actions I used in the genetic algorithm:
Note all these transformations and actions are inside a loop that iterates 1000 times.
Any help will be appreciated.
Can you provide the code you are using? A simplified version without your logic of course. The first thing to do is to cache RDDs you are reusing. It is hard to say without some actual code, but if you are starting always from the same RDD for the 1000 iterations you definitely need to cache it before your loop. It might be worth also to cache the RDD before your min operations and uncache it after the collect. But I might have misunderstood your flow since there is no code, just your description.