Support Questions
Find answers, ask questions, and share your expertise

Deciding spark job configuration

Deciding spark job configuration


I'd like to know how to configure a spark job running on a small cluster consisting of 3 nodes

I have 3 nodes each has 4 cores and 16G ram, I'd like to know what would be the best confiugration to process a 3.5G file, basically the task is search contents of rdd (200 keyword) in another rdd (40 Million line)

how can I configure number of executors, core and memory to get best performance ?