We have a spark streaming application receiving data from kafka... we see the application is slow and trying to find the reason for it....i see the Receiver Blocks Aggregated Block Metrics by Executor
Was wondering if this distribution on only one executor has anything to do with it ? Any help will be greatly appreciated.
You only have one reciever, so that executor will contain all the blocks. To distribute the blocks to other executors, you have a few options:
1. Call repartition on your RDD
2. Increase replication and the same block will be on mutliple executors.
3. Create multiple receivers and union RDD later.