We recently upgraded our workers on a CM 5.13 / CDH 5.12.2 cluster (5 workers, 16 cores / 32 HT). This is an AWS install.
After running the static allocation (15% HDFS, 40% Impala, 45% YARN) I'm seeing allocations that sent up a yellow flag.
yarn.nodemanager.resource.cpu-vcores = 14
yarn.scheduler.maximum-allocation-vcores = 14
And from a worker Host Resources page...
DataNode CPU = 0.5
Impala Daemon CPU = 1
NodeManager process = 0.5
NodeManager MR Containers = 14
Based on reading, it seems that YARN allocation should be tied to physical cores, not HT. In that case (and assuming HDFS and Impala follow suit), the 14 (16 total) seems appropriate.
However, the Cloudera YARN Tuning spreadsheet seems to actively take HT into account, and plugging in some of my numbers yields a yarn.nodemanager.resource.cpu-vcores = 28.
Which piece is lying to me? CM or the spreadsheet? 🙂 Thanks!
Actually, in re-reading my message, my fatal (and obvious) flaw likely is that the resource calculator has no knowledge of my 40/45 split between Impala and YARN. 🙂