Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Best balance of number of CPU and number of Disks for slave nodes


I know we have best practice for balance of # of cores and # of disks.

# of containers = min (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)

I believe, it means that slave nodes in "2*CORES = 1.8*DISKS" are best balanced one in point of CPUs and Disks.

Does anyone know the "CORES" means whether number of "physical" cores or number of "virtual" cores (i.e. Hyper-Threading Technology)?

If it means "physical" cores, number of physical CPU cores is nice to be 12 with 12 disks.

If it means "virtual" cores by for example Intel HT, 6 physical cores would be enough with 12 disks (best balanced node).

Also, I'm wandering, we should enable Hyper-Threading or not to get better "throughput".

Any reply, comment and suggestion will help me. Thanks!



This page also says "CORES (number of CPU cores)".

So, "CORES" means "physical cores"?

If it's true, 12 physical CPU cores (24 vcores by Intel HT) may be good enough for a node with 12 HDDs.

I'm welcome any opinions from all of you 🙂

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.