Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Does enabling, CPU scheduling in YARN will really improve the parallel processing in spark?

Highlighted

Does enabling, CPU scheduling in YARN will really improve the parallel processing in spark?

New Contributor

YARN with capacity scheduler will take only memory into account when it is allocating resources for user requests If I submit a spark job like this "--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --num-executors 1 --executor-cores 3", yarn will allocate an executor with 4gb memory and 1 vcpu, but when it is executing tasks, it will execute 3 tasks parallelly.

Is it using that single core alone to execute all tasks as a set of 3 at a time?

So If I enable CPU scheduling and CGroups, will yarn assign 3 vcpu cores and will that set of 3 tasks will get executed in each cpu? Will it really improve the processing time?

As for now, I could not enable CPU scheduling in my cluster (centos 7.5) due to the below error in starting node manager "Not able to enforce cpu weights; cannot write to cgroup at: /sys/fs/cgroup/cpu,cpuacct"

Don't have an account?
Coming from Hortonworks? Activate your account here