Is there a similar guide for plain CDH installations, which details disk space requirements and similar details to that CM guide?
It all depends or what you want to run, however here are some guidelines. This information excludes the "data" drives since you can have as many as you want with the size you need.
1) CM requirements will apply to CDH, like for the /var folder
2) CM will start to alert you when Journal Nodes, Namenodes, and other processes directories start to be under 10GB. Therefore, accounting at least 20GB per service for the "meta" (logs, configs, binaries, etc.) is a good idea. So if you have a YARN + DN + Spark on a node, give them at least 60GB of disk space for
3) Master processes will use space based on the size of the cluster. Indeed, the bigger the cluster is, the more data, the more blocks, the more space is used on the NN and JN directories. So for clusters bigger than 30 nodes you might want to think about giving them a bit more.
Now. It is not recommended to run any service on the OS disk (and not just partition). And since disks are bigger and bigger, you might end up with something like 1TB available on your partitition for CM agent + CDH services (on worker nodes). If that's the case, I don't think you should really worry about the available space and just share this space between the different mounting points (if split in partitions).