05-01-2017 08:19 AM
Is there a way that a spark streaming job can release resources while idling and waiting for actual data to come in?
I expect, thats when it actually can request more resources from YARN and continue to run. But it doesnt release resources from YARN.
Our spark jobs are made to pick up data as soon a file is dropped every half an hour.
While idling for data, driver doesnt release resources and streaming resource pool is always full, resulting in all other jobs waiting or doesnt utilize the settings for this pool. (like immediately get resources for this pool)
05-01-2017 08:33 AM
This is just what dynamic allocation is for, and you can enable it for a streaming job to add/remove executors in response to demand.
It's not as great an idea for streaming because your job will be mostly idle, and then need to quickly process a new batch of data, and it will take some time to reacquire new executors to do the work. Still, it is entirely possible.